What is Data Mining ?

Data Mining , also known as Knowledge-Discovery in Databases (KDD) is a class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn’t just change the presentation, but actually discovers previously unknown relationships among the data.

Data Mining Process

Data Mining which is a version of artificial intelligence is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites. With the advent of computers, large databases and the internet, it is easier than ever to collect millions, billions and even trillions of pieces of data that can then be systematically analyzed to help look for relationships and to seek solutions to difficult problems.

How does data mining work?

Well the quick answer is that large amounts of data are collected. Usually most entities that perform data mining are large corporations and government agencies. They have been collecting data for decades and they have lots of data to sift through. If you are a fairly new business or individual, you can purchase certain types of data in order to mine for your own purposes. In addition, data can also be stolen from large depositories by hackers by hacking their way into a large database or simply stealing laptops that are ill protected.

Data mining uses a variety of mathematical algorithms to analyze historical data. The results of this analysis are then used to build models based on real world behavior, which are in turn used to analyze incoming data and make predictions about future behavior.

Data Mining Real world example : 9/11 attacks

(Data Mining was supposedly used to determine the leader of the 9/11 attacks.)

Take the real world example of the 9/11 attacks. There is historical data about these attacks: a number of foreign men applied to flight school in the United States as part of their attack planning. Based on this historical data, a model can be created that collects data on all of the future applicants to flight schools to see whether any are foreign, or share other characteristics.

Actually, though, interest in data mining began before September 11, 2001: In the late 1990s, the Department of Defense authorized a data mining program called Able Danger, that was used to gather counter terrorism information, including information about Al Queda , from late 1998 through early 2001.

Can Data Mining be really good at counter terrorism ?

The answer as of now is “not so much” because Data mining is a technique best reserved for large data sets. There is not enough data about terrorism or terrorists to build a good predictive model.

While many, many people shop for groceries in a day, not very many plan or execute terrorist attacks. This means there is very little data on which to base a model, which in turn means that the data being used to build a model does not represent a pattern of behavior, but a one or two time event.

