Sunday, November 20, 2016

NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE


NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE



What is Association Rules?

According to blog of Margaret Rouse, Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. An example of an association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase milk."

An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.



Association Rules to Predict the Weather

I would like to create a dummy data for BMKG that can be used to predict the weather and this kind of information can be shared to the citizens. Given a set of data or information, that will predict the occurrence of an item based on the occurrences of other items

TID
ITEMS
1
Storm, Rainy, Thunder, Drizzle
2
Thunder, Rainy, Windy, Storm, Drizzle
3
Windy, Rainy, Drizzle, Storm
4
Drizzle, Rainy, Thunder
5
Thunder
6
Thunder, Drizzle
7
Rainy, Thunder, Storm
8
Drizzle, Windy

Mining Association Rules
There are two approaches of mining association rules:
1.      Frequent Itemset Generation
In order to generate a frequent itemset list one must avoid using the brute force approach because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies that used to fix this problem are:
Ø  Reduce the number of candidates (Apriori Principle): use pruning techniques such as the Apriori principle to eliminate some of the candidate itemsets without counting their support values
Ø  Reduce the number of transactions: by combining transactions together we can reduce the total number of transactions
Ø  Reduce the number of comparisons (FP-Growth): use efficient data structures to store the candidates thereby eliminating the need to match every candidate against every transaction.
2.      Rule Generation
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset. Frequent itemset generation is still computationally expensive

Predict the Dummy Data of the Weather by Using Mining Association Rules
In this case, I would like to use the first Mining Association Rules to predict and analyze the dummy data of the weather. In the first rule there are three strategies that can be fixed to slolve the problem and I will only use two strategies (Apriori Principle and FP-Growth).
The Dummy Data (I give an example here for minimum support is 30%)
TID
ITEMS
1
Storm, Rainy, Thunder, Drizzle
2
Thunder, Rainy, Windy, Storm, Drizzle
3
Windy, Rainy, Drizzle, Storm
4
Drizzle, Rainy, Thunder
5
Thunder
6
Thunder, Drizzle
7
Rainy, Thunder, Storm
8
Drizzle, Windy
I give an example here for minimum support is 30%
Calculate minimum support
30% * 8 = 2.4
Frequency of occurrence
Items
Frequency
A
5
B
6
C
3
D
6
E
4
Prioritize the Item
Items
Frequency
Priority
A
5
3
B
6
1
C
3
5
D
6
2
E
4
4
Order the items according to the priority
TID
Items
Ordered Items
1
Storm, Rainy, Thunder, Drizzle
Drizzle, Thunder, Rainy, Storm
2
Thunder, Rainy, Windy, Storm, Drizzle
Drizzle, Thunder, Rainy, Storm, Windy
3
Windy, Rainy, Drizzle, Storm
Drizzle, Rainy, Storm, Windy
4
Drizzle, Rainy, Thunder
Drizzle, Thunder, Rainy
5
Thunder
Thunder
6
Thunder, Drizzle
Drizzle, Thunder
7
Rainy, Thunder, Storm
Thunder, Rainy, Storm
8
Drizzle, Windy
Drizzle, Windy
FP-Tree
Validation



REFERENCES
http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
http://www.belajaringgris.net/weather-vocabulary-3252.html
http://www.hypertextbookshop.com/dataminingbook/working_version/contents/chapters/chapter002/section002/blue/page001.html

No comments: