Sunday, November 20, 2016

NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE


NEW DISCOVERY OF ASSOCIATION RULES IN DATA MINING THAT IS IMPLEMENTED IN THE REAL LIFE



What is Association Rules?

According to blog of Margaret Rouse, Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. An example of an association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase milk."

An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.



Association Rules to Predict the Weather

I would like to create a dummy data for BMKG that can be used to predict the weather and this kind of information can be shared to the citizens. Given a set of data or information, that will predict the occurrence of an item based on the occurrences of other items

TID
ITEMS
1
Storm, Rainy, Thunder, Drizzle
2
Thunder, Rainy, Windy, Storm, Drizzle
3
Windy, Rainy, Drizzle, Storm
4
Drizzle, Rainy, Thunder
5
Thunder
6
Thunder, Drizzle
7
Rainy, Thunder, Storm
8
Drizzle, Windy

Mining Association Rules
There are two approaches of mining association rules:
1.      Frequent Itemset Generation
In order to generate a frequent itemset list one must avoid using the brute force approach because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies that used to fix this problem are:
Ø  Reduce the number of candidates (Apriori Principle): use pruning techniques such as the Apriori principle to eliminate some of the candidate itemsets without counting their support values
Ø  Reduce the number of transactions: by combining transactions together we can reduce the total number of transactions
Ø  Reduce the number of comparisons (FP-Growth): use efficient data structures to store the candidates thereby eliminating the need to match every candidate against every transaction.
2.      Rule Generation
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset. Frequent itemset generation is still computationally expensive

Predict the Dummy Data of the Weather by Using Mining Association Rules
In this case, I would like to use the first Mining Association Rules to predict and analyze the dummy data of the weather. In the first rule there are three strategies that can be fixed to slolve the problem and I will only use two strategies (Apriori Principle and FP-Growth).
The Dummy Data (I give an example here for minimum support is 30%)
TID
ITEMS
1
Storm, Rainy, Thunder, Drizzle
2
Thunder, Rainy, Windy, Storm, Drizzle
3
Windy, Rainy, Drizzle, Storm
4
Drizzle, Rainy, Thunder
5
Thunder
6
Thunder, Drizzle
7
Rainy, Thunder, Storm
8
Drizzle, Windy
I give an example here for minimum support is 30%
Calculate minimum support
30% * 8 = 2.4
Frequency of occurrence
Items
Frequency
A
5
B
6
C
3
D
6
E
4
Prioritize the Item
Items
Frequency
Priority
A
5
3
B
6
1
C
3
5
D
6
2
E
4
4
Order the items according to the priority
TID
Items
Ordered Items
1
Storm, Rainy, Thunder, Drizzle
Drizzle, Thunder, Rainy, Storm
2
Thunder, Rainy, Windy, Storm, Drizzle
Drizzle, Thunder, Rainy, Storm, Windy
3
Windy, Rainy, Drizzle, Storm
Drizzle, Rainy, Storm, Windy
4
Drizzle, Rainy, Thunder
Drizzle, Thunder, Rainy
5
Thunder
Thunder
6
Thunder, Drizzle
Drizzle, Thunder
7
Rainy, Thunder, Storm
Thunder, Rainy, Storm
8
Drizzle, Windy
Drizzle, Windy
FP-Tree
Validation



REFERENCES
http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining
http://www.belajaringgris.net/weather-vocabulary-3252.html
http://www.hypertextbookshop.com/dataminingbook/working_version/contents/chapters/chapter002/section002/blue/page001.html

Sunday, November 13, 2016

SUMMARY OF WORKSHOP 1: DATA EXCHANGE IN A DIGITAL WORLD

SUMMARY OF WORKSHOP 1: DATA EXCHANGE IN A DIGITAL WORLD

INTRODUCTION TO BIG DATA
Big data is a term that describes the large volume of data both structured and unstructured that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
5V of Big Data
1.      Volume refers to the vast amounts of data generated every second. Just think of all the emails, twitter messages, photos, video clips, sensor data etc. We produce and share every second. For example is on Facebook we send 10 billion messages per day, click the "like' button 4.5 billion times and upload 350 million new pictures each and every day.
2.      Velocity refers to the speed at which new data is generated and the speed at which data moves around. For example is social media messages going viral in seconds.
3.      Value: It is all well and good having access to big data but unless we can’t turn it into value it is useless.
4.      Veracity refers to the messiness or trustworthiness of the data.
5.      Variety refers to the different types of data we can now use.
BENEFIT OF BIG DATA
Research: doing the research to know deeply about big data and collect the data to be analyzed
Business Intelegence: the data that have been analyzed can be an  business information that can be used to make a decision.
New Business Opportunity: with more knowledge about business information and ecosystem in a digital world, there will be new and big potential business