“Data Mining is one of the hottest fields in Computer Science. Data has been accumulating throughout the computer age in many forms, including database systems, spreadsheets, text files, and recently web pages. These data have been stored on hard drives and temporary storage media. Database programs can query for specific information such as “how many patients are over age 70,” but there is potentially much more in the data than such specific information. The real treasure could be some interesting new patterns, that we don’t even know that we should ask for, for example, “the best predictor of Alzheimer disease for patients over 70 is the ratio of Tau and Ab42 proteins”.

Data mining programs are intended to search through data for hidden relationships and patterns in your data. This is particularly pertinent to marketing companies who want to know what made a specific group of people buy their product. It can also be very important in scientific fields such as medicine where finding correlations in groups of people who are affected by a similar disease could be very helpful. Data mining is needed to make sense and use of the rapidly growing data and is an essential field of the 21st century.

Made possible through a generous grant from the Howard Hughes Medical Institute and the W. M. Keck Foundation to Connecticut College, this CD and website contain a set of modules for a complete 1-semester course in data mining. In addition, there are also modules for individual lectures on data mining in the context of courses on Algorithms, Artificial Intelligence, and Introduction to Computer Science.”

From Gregory Piatetsky-Shapiro:

So you are thinking of studying data mining and knowledge discovery?

This field studies how to analyze the huge volumes of data and information generated by businesses, science, web, and other sources and how to separate true patterns there from random and false ones.

Think of the world as a multitude of giant, constantly shifting patterns. Data mining provides you the tools to analyze the world and finding the true patterns there.

Data mining is already applied today to unlocking the secrets of human genome, to improving business and e-commerce, to analyzing the web, to detecting fraud, …
even to analyzing baseball!

With the amount of information in the world estimate to double about every 9 months, data mining is expected to be one of the hot professions of the 21st century.

I hope you will take the course!

Recommended Prerequisites: This is equivalent to a 300-level university course. The more advanced modules marked with asterisks can be skipped for a more introductory-level course.

