First of all, we need to clarify what is our Xtrain and ytrain. As we mentioned before, we use train set target column to be ytrain and rest of train set as Xtrain. We use test set as Xtest.
Since the outcome of the prediction is 0s and 1s, we could use multiple types of classifier in this case. I simply pick 5 classifier and list them below.
Decision Tree Classifier
Random Forest Classifier
KNN Classifier
MLP Classifer
SVC Classifier
GradientBoostingClassifier
a. How does Decision Tree work?
A structure that can be used to divide up a large collection of records into successively smaller sets of records by applying a sequence of simple decision rules.
A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous groups with respect to a particular target variable.
In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
The basic algorithm is the greedy algorithm. General speaking, this method follow the rule to find the logically optimal choice in each stages but does not usually produce an overall optimal solution. However, a greedy heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.
The decision tree structure is as below:
- Start with an empty tree
2.Select one of the unused features to split data
–Partition the node population and calculate information gain
–Find the split with maximum information gain for this attribute
–Select the split that produces the greatest “separation” in the target variable.
3.Repeat this for all attributes
–Find the best splitting attribute along with the best split rule
4.Split the node using the attribute
5.Go to each child node and repeat step 2 to 4 - Stop Criteria : a.Each leaf-node contains examples of one class (homogeneous node); b. There are no remaining attributes for further partitioning(all the records have similar attribute values; majority voting is employed for classifying the leaf)
By using decision tree classifier as below, the Kaggle score is: 0.82273.
b. How does Random Forest work?
Both Decision tree and Random Forest are bagging aggregation. Random Forest is an ensemble learning method operated by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction (regression) of the individual trees.
However, Random decision forests correct for decision trees’ habit of overfitting to their train set and it runs with high accuracy and efficiency.
The random forest algorithm structure is showing below:
1.Select ntree: the number of trees to grow, and mtry: a number no larger than number of variables.
2.For i = 1 to ntree:
3.Draw a bootstrap sample from the data. Call those not in the bootstrap sample the “out-of-bag” data.
4.Grow a “random” tree, where at each node, the best split is chosen among mtry randomly selected variables. The tree is grown to maximum size and not pruned back.
5.Use the tree to predict out-of-bag data.
6.In the end, use the predictions on out-of-bag data to form majority votes.
7.Prediction of test data is done by majority votes from predictions from the ensemble of trees.
By using Random Forest classifier as below, the Kaggle score is: 0.76489.
c. How does KNN work?
K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). It is an instance-based learning method which is different to eager learning. It works by storing all training instances or some exemplars (representative examples) and assigning target function to a new instance.
Nearest neighbor (need not be an exact match) uses k “closest” points (nearest neighbors) for performing classification.
- Assumes all instances are points in n-‐dimensional space (no feature selection)
- A distance measure is needed to determine the “closeness” of instances
- Classify an instance by finding its nearest neighbors and picking the most popular class among the neighbors.
Use Euclidean distance to calculate the distance and take the majority vote of class labels among the k-‐nearest neighbors, then weigh the vote according to distance.
By using KNN classifier as below, the Kaggle score is: 0.53191.
d. How does MLP work?
A multilayer perceptron (MLP) is a class of feedforward artificial neural network (ANN). An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear functions(Sigmoid activations functions for smooth).
By using MLP classifier as below, the Kaggle score is: 0.78539.
e. How does SVC work?
A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. General speaking, SVM means find a linear hyperplane than maximum the margin (decision boundary) that will separate the data.
Mostly, SVM works for linearly division, but we could still expand input into high-dimensional space to deal with linearly non-separable cases.
By using SVM classifier as below, the Kaggle score is: 0.50236.
f. How does Gradient Boosting work?
It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary loss function.
By using Gradient Boosting classifier, the Kaggle score is: 0.81743.
Credit: BecomingHuman By: SydneyChen