RAPIDMINER
I
use RapidMiner in analysing the data pemilu dataset, to make a prediction model
of Elektabilitas Caleg.
RapidMiner
is a software platform developed by the company of the same name that provides
an integrated environment for machine learning, data mining, text mining,
predictive analytics and business analytics. It is used for business and
commercial applications as well as for research, education, training, rapid
prototyping, and application development and supports all steps of the data
mining process including data preparation, results visualization, validation
and optimization. RapidMiner is developed on an open core model. The RapidMiner
(free) Basic Edition is limited to 1 logical processor and 10,000 data rows is
available under the AGPL license.
I
am going to create prediction model of prediksi elektabilitas caleg using data
sets given on the link (datapemilukpu.xls) that the lecturer given to us in
slide chapter 7. There are many algorithms and operators available in
RapidMiner, but in this prediction, I will use three main algorithms, which
are; Decision Tree (C4.5), Naïve Bayes (NB) and K-Nearest Neighbor (K-NN). I am
creating the prediction model in order to know the legislative prediction,
whether he/she are going to be elected or not.
PREDICTION MODEL
1. DECISION
TREE (C4.5)
For
the first algorithms, we will use Decision Tree as our modelling, it generates
classification of both nominal and numerical data. In RapidMiner an attribute
with label role will be predicted by the Decision Tree operator. According to
RapidMiner website, we could know that each interior node of tree corresponds
to one of the input attributes. The number of edges of a nominal interior node
is equal to the number of possible values of the corresponding input attribute.
Outgoing edges of numerical attributes are labeled with disjoint ranges. Each
leaf node represents a value of the label attribute given the values of the
input attributes represented by the path from the root to the leaf.
Example
of decision tree model’s result and workspace in Rapidminer
2. NAIVE
BAYES
Acccording
to RapidMiner website, A Naive Bayes classifier is a simple probabilistic
classifier based on applying Bayes’ theorem (from Bayesian statistics) with
strong (naive) independence assumptions. A more descriptive term for the underlying
probability model would be ‘independent feature model’. In simple terms, a
Naive Bayes classifier assumes that the presence (or absence) of a particular
feature of a class (i.e. attribute) is unrelated to the presence (or absence)
of any other feature. For example, a fruit may be considered to be an apple if
it is red, round, and about 4 inches in diameter. Even if these features depend
on each other or upon the existence of the other features, a Naive Bayes
classifier considers all of these properties to independently contribute to the
probability that this fruit is an apple.
The
advantage of the Naive Bayes classifier is that it only requires a small amount
of training data to estimate the means and variances of the variables necessary
for classification. Because independent variables are assumed, only the
variances of the variables for each label need to be determined and not the
entire covariance matrix.
3. K-Nearest
Neighbor (KNN)
From
RapidMiner website, K-Nearest Neighbour model is to generates from the input
ExampleSet, this model can be a classification or regression model depending on
the input ExampleSet. The k-Nearest Neighbor algorithm is based on learning by
analogy, that is, by comparing a given test example with training examples that
are similar to it. The training examples are described by n attributes. Each
example represents a point in an n-dimensional space. In this way, all of the
training examples are stored in an n-dimensional pattern space. When given an
unknown example, a k-nearest neighbor algorithm searches the pattern space for
the k training examples that are closest to the unknown example. These k
training examples are the k “nearest neighbors” of the unknown example.
“Closeness” is defined in terms of a distance metric, such as the Euclidean
distance.
REFERENCES
https://en.wikipedia.org/wiki/RapidMiner
No comments:
Post a Comment