The output is a numerical (contentious). Based on the different features of the house (# floors, # rooms, etc…) predict its the price.
The output is a categorical (discrete). Based on different features of the house (# floors, # rooms, etc…) predict if the house is Cheap/Expensive.
For non-labeled data (Unsupervised), split the data in such a way that each group (cluster) represent a proper split.
Finding a relationship between transactions. People who bought milk, they also bought cereal which helps supermarket to relocate cereal next to milk to increase their revenue and help customers find related things easily.
- Outlier Analysis
- Trend and Evolution Analysis
- Text Mining, Topic Modeling, Graph Mining
- Sentiment Analysis, Opinion Mining
- and much more…
Data mining may generate thousands of patterns, typically not all of them are interesting. Patterns are interesting if it is easily understood by humans, valid on new or test data with a certain degree of certainty and validate hypothesis user seeks to confirm.
Objective VS. Subjective Measures
Objective measures are based on statistics and structures of patterns. Subjective measures are based on the user’s belief in the data.
Completeness & Optimization
Completeness, find all the interesting patterns. Then, optimization comes into play by searching for only the interesting patterns.