The value of every information system is the opportunity for insight. Once an organization has insight all things are possible. With insight comes new opportunity – to make money, to save money, to improve goods and services, and so forth.
But where does insight come from? Insight can come as easily as glancing out a window and seeing a rainbow or insight may be as difficult as investing money in a dot com startup only to ultimately discover that there never was a business case for what was being built.
But one of the surest ways to spark the fire of insight is through correlation analysis. Correlation analysis is analysis of events that occur together. Suppose a study is about event A. But in gathering information about event A it is noticed that another event – event B – happens frequently. From an insight perspective, the fact that two events occur in tandem with each other is great opportunity to gather insight. Some typical questions are –
- Does event A cause event B? If so, under what conditions?
- Is there another cause of event A and event B? If so, what is the cause and under what conditions are the different events triggered?
Caution must be exercised when making inferences based on coincidence of events. Given enough factors a correlation can be discovered which is nonsensical. Just because two factors have events in concert with each other does not necessarily imply a real relationship.
A famous correlation once was the correlation between the winner of the Super Bowl and whether the stock market would rise or fall that year. For many years when the NFL won the Super Bowl the stock market would rise and other years when the AFL won the Super Bowl the market would fall.
Of course, there is no actual relationship between professional football and the stock market. The coordination of the events was purely a random coincidence.
Having stated that, there are many cases where there is a real reason for the correlation of events. When there is a real relationship between events, correlation analysis is a powerful analytic tool.
There are plenty of other possibilities for insight when using correlation analysis.
Correlation analysis is never more powerful and never more pregnant with possibilities then when applied to the medical and health care environment. When a medical or healthcare organization gathers all of its episodes of care and other encounters between doctors and patients and then integrates and assimilates the associated events, insightful conclusions can result. Often times very interesting and unexpected results can be the result.
Looking at the natural correlations that have evolved over the years in the practice of medicine can be very useful. What happens is that often times no major patterns are discerned by any one doctor (short of alerting his/her intuition) because the doctor sees only one patient at a time, that is the patient that is immediately in front of him/her.
But given many observations taken over many doctors and patients and taken over a lengthy period of time, medical and health patterns start to emerge that have otherwise been unnoticed. And sometimes these patterns have very profound implications to healthcare.
One of the challenges of doing correlation analysis that is meaningful is that many events must be reported. It doesn’t do much good to examine 50 or even 500 incidences of medical care. Instead 500,000 or 5,000,000 incidences of care are much better for spotting previously unseen correlations and patterns.
Another of the challenges is that of dealing with the data. Medical and healthcare data is notoriously unstructured. Medical and healthcare data is usually textual data. In addition, there is a very variable nomenclature for the same event or activity. Recently a knowledgeable physician told me that there were at least 15 ways to describe the same thing – a broken bone.
Yet another problem with meaningful correlation analysis is that in some cases there are terms that are spelled differently. To do meaningful correlation analysis, there needs to be a single consistent spelling of the same term.
Still another problem is that for correlation analysis to be effective, the results need to be visual. When correlation analysis is not visual, patterns that are unclear or faint tend to hide – to get lost in the massive amount of data and other more obvious patterns. But when visual techniques are used for correlation analysis, the chances of spotting faint and unclear patterns become greatly enhanced.
The good news is that correlation analysis for the medical community is now a real possibility. Leading medical research firms are starting to use powerful new technology for very sophisticated correlation analysis.
It is a reasonable expectation that today’s technology can spot the important patterns that were as little as a year ago unable to be found.
Now you can hear and see Bill Inmon on the Internet. Take a look at his new videotape education series on safaribooksonline.com
Bill covers IT topics from A to Z.
Provide your comments below
William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions.