FARHANDHIKA GISWARA'S BLOG: November 2016

NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE

What is Association Rules?

According to blog of Margaret Rouse, Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. An example of an association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase milk."

An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.

Association Rules to Predict the Weather

I would like to create a dummy data for BMKG that can be used to predict the weather and this kind of information can be shared to the citizens. Given a set of data or information, that will predict the occurrence of an item based on the occurrences of other items

TID	ITEMS
1	Storm, Rainy, Thunder, Drizzle
2	Thunder, Rainy, Windy, Storm, Drizzle
3	Windy, Rainy, Drizzle, Storm
4	Drizzle, Rainy, Thunder
5	Thunder
6	Thunder, Drizzle
7	Rainy, Thunder, Storm
8	Drizzle, Windy

Mining Association Rules

There are two approaches of mining association rules:

1. Frequent Itemset Generation

In order to generate a frequent itemset list one must avoid using the brute force approach because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies that used to fix this problem are:

Ø Reduce the number of candidates (Apriori Principle): use pruning techniques such as the Apriori principle to eliminate some of the candidate itemsets without counting their support values

Ø Reduce the number of transactions: by combining transactions together we can reduce the total number of transactions

Ø Reduce the number of comparisons (FP-Growth): use efficient data structures to store the candidates thereby eliminating the need to match every candidate against every transaction.

2. Rule Generation

Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset. Frequent itemset generation is still computationally expensive

Predict the Dummy Data of the Weather by Using Mining Association Rules

In this case, I would like to use the first Mining Association Rules to predict and analyze the dummy data of the weather. In the first rule there are three strategies that can be fixed to slolve the problem and I will only use two strategies (Apriori Principle and FP-Growth).

The Dummy Data (I give an example here for minimum support is 30%)

TID	ITEMS
1	Storm, Rainy, Thunder, Drizzle
2	Thunder, Rainy, Windy, Storm, Drizzle
3	Windy, Rainy, Drizzle, Storm
4	Drizzle, Rainy, Thunder
5	Thunder
6	Thunder, Drizzle
7	Rainy, Thunder, Storm
8	Drizzle, Windy

I give an example here for minimum support is 30%

Calculate minimum support

30% * 8 = 2.4

Frequency of occurrence

Items	Frequency
A	5
B	6
C	3
D	6
E	4

Prioritize the Item

Items	Frequency	Priority
A	5	3
B	6	1
C	3	5
D	6	2
E	4	4

Order the items according to the priority

TID	Items	Ordered Items
1	Storm, Rainy, Thunder, Drizzle	Drizzle, Thunder, Rainy, Storm
2	Thunder, Rainy, Windy, Storm, Drizzle	Drizzle, Thunder, Rainy, Storm, Windy
3	Windy, Rainy, Drizzle, Storm	Drizzle, Rainy, Storm, Windy
4	Drizzle, Rainy, Thunder	Drizzle, Thunder, Rainy
5	Thunder	Thunder
6	Thunder, Drizzle	Drizzle, Thunder
7	Rainy, Thunder, Storm	Thunder, Rainy, Storm
8	Drizzle, Windy	Drizzle, Windy

FP-Tree

Validation

REFERENCES

http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining

http://www.belajaringgris.net/weather-vocabulary-3252.html

http://www.hypertextbookshop.com/dataminingbook/working_version/contents/chapters/chapter002/section002/blue/page001.html

SUMMARY OF WORKSHOP 1: DATA EXCHANGE IN A DIGITAL WORLD

INTRODUCTION TO BIG DATA

Big data is a term that describes the large volume of data both structured and unstructured that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

5V of Big Data

1. Volume refers to the vast amounts of data generated every second. Just think of all the emails, twitter messages, photos, video clips, sensor data etc. We produce and share every second. For example is on Facebook we send 10 billion messages per day, click the "like' button 4.5 billion times and upload 350 million new pictures each and every day.

2. Velocity refers to the speed at which new data is generated and the speed at which data moves around. For example is social media messages going viral in seconds.

3. Value: It is all well and good having access to big data but unless we can’t turn it into value it is useless.

4. Veracity refers to the messiness or trustworthiness of the data.

5. Variety refers to the different types of data we can now use.

BENEFIT OF BIG DATA
Research: doing the research to know deeply about big data and collect the data to be analyzed

Business Intelegence: the data that have been analyzed can be an business information that can be used to make a decision.

New Business Opportunity: with more knowledge about business information and ecosystem in a digital world, there will be new and big potential business

FARHANDHIKA GISWARA'S BLOG

My Profile

Sunday, November 20, 2016

NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE

Sunday, November 13, 2016

SUMMARY OF WORKSHOP 1: DATA EXCHANGE IN A DIGITAL WORLD

Blog Archive