Data mining applied to clinical trial data.
A large amount of data is collected
during clinical trials of a new drug but it is analyzed in a narrow way.
These trials are designed specifically to answer two questions: (1) is
the drug safe, and (2) does it work. Hence, the analysis is almost strictly
an application of confirmatory statistics. Inside this data, however, is
information on subpopulation performance, usually termed responder analysis.
The question of how one can predict who will respond and who won't makes
this a classification problem.
Automated data layout and content analysis. Financial analysis and
modeling companies often receive data in unknown or uncertain format. The
layout and content of the client's files may have changed, or (more likely)
the client's IT department may be less than prompt in providing documentation.
To produce good analyses and good models, it is vital that the data be
completely understood. This calls for a file exploration utility, a program
that can scan data, interpret it with a user-supplied (and modifiable)
layout, collect distributions, and compare those distributions to known
historical distributions. An even smarter program could make intelligent
guesses about the layout using domain knowledge and general knowledge of
file structures, e.g., formats tend to be repetitive with little variation
between records.