NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS
IMPLEMENTED IN THE REAL LIFE
What
is Association Rules?
According to blog of Margaret Rouse, Association
rules are if/then statements that help uncover relationships between seemingly
unrelated data in a relational database or other information repository. An
example of an association rule would be "If a customer buys a dozen eggs,
he is 80% likely to also purchase milk."
An
association rule has two parts, an antecedent (if) and a consequent (then). An
antecedent is an item found in the data. A consequent is an item that is found
in combination with the antecedent. Association
rules are created by analyzing data for frequent if/then patterns and using the
criteria support and confidence to identify the most important relationships.
Support is an indication of how frequently the items appear in the database.
Confidence indicates the number of times the if/then statements have been found
to be true.
Association
Rules to Predict the Weather
I would like to create a dummy data for BMKG that can be
used to predict the weather and this kind of information can be shared to the
citizens. Given a set of data or information, that will predict the occurrence
of an item based on the occurrences of other items
TID
|
ITEMS
|
1
|
Storm, Rainy, Thunder, Drizzle
|
2
|
Thunder, Rainy, Windy, Storm,
Drizzle
|
3
|
Windy, Rainy, Drizzle, Storm
|
4
|
Drizzle, Rainy, Thunder
|
5
|
Thunder
|
6
|
Thunder, Drizzle
|
7
|
Rainy, Thunder, Storm
|
8
|
Drizzle, Windy
|
Mining Association Rules
There
are two approaches of mining association rules:
1. Frequent
Itemset Generation
In
order to generate a frequent itemset list one must avoid using the brute force
approach because it can be very expensive to search through the whole data set
to find the support count of each itemset. Some of the strategies that used to
fix this problem are:
Ø Reduce
the number of candidates (Apriori Principle): use pruning techniques such as
the Apriori principle to eliminate some of the candidate itemsets without
counting their support values
Ø Reduce
the number of transactions: by combining transactions together we can reduce
the total number of transactions
Ø Reduce
the number of comparisons (FP-Growth): use efficient data structures to store
the candidates thereby eliminating the need to match every candidate against
every transaction.
2. Rule
Generation
Generate
high confidence rules from each frequent itemset, where each rule is a binary
partitioning of a frequent itemset. Frequent itemset generation is still
computationally expensive
Predict the Dummy Data of
the Weather by Using Mining Association Rules
In
this case, I would like to use the first Mining Association Rules to predict
and analyze the dummy data of the weather. In the first rule there are three
strategies that can be fixed to slolve the problem and I will only use two
strategies (Apriori Principle and FP-Growth).
The
Dummy Data (I give an example here for
minimum support is 30%)
TID
|
ITEMS
|
1
|
Storm, Rainy, Thunder, Drizzle
|
2
|
Thunder, Rainy, Windy, Storm,
Drizzle
|
3
|
Windy, Rainy, Drizzle, Storm
|
4
|
Drizzle, Rainy, Thunder
|
5
|
Thunder
|
6
|
Thunder, Drizzle
|
7
|
Rainy, Thunder, Storm
|
8
|
Drizzle, Windy
|
I
give an example here for minimum support is 30%
Calculate
minimum support
30%
* 8 = 2.4
Frequency
of occurrence
Items
|
Frequency
|
A
|
5
|
B
|
6
|
C
|
3
|
D
|
6
|
E
|
4
|
Prioritize
the Item
Items
|
Frequency
|
Priority
|
A
|
5
|
3
|
B
|
6
|
1
|
C
|
3
|
5
|
D
|
6
|
2
|
E
|
4
|
4
|
Order
the items according to the priority
TID
|
Items
|
Ordered Items
|
1
|
Storm, Rainy, Thunder, Drizzle
|
Drizzle, Thunder, Rainy, Storm
|
2
|
Thunder, Rainy, Windy, Storm,
Drizzle
|
Drizzle, Thunder, Rainy, Storm, Windy
|
3
|
Windy, Rainy, Drizzle, Storm
|
Drizzle, Rainy, Storm, Windy
|
4
|
Drizzle, Rainy, Thunder
|
Drizzle, Thunder, Rainy
|
5
|
Thunder
|
Thunder
|
6
|
Thunder, Drizzle
|
Drizzle, Thunder
|
7
|
Rainy, Thunder, Storm
|
Thunder, Rainy, Storm
|
8
|
Drizzle, Windy
|
Drizzle, Windy
|
FP-Tree
Validation
REFERENCES
http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
http://www.belajaringgris.net/weather-vocabulary-3252.html
http://www.hypertextbookshop.com/dataminingbook/working_version/contents/chapters/chapter002/section002/blue/page001.html
No comments:
Post a Comment