In Machine Learning, Classifiers learns from the training data, and models some decision making framework. With that knowledge it classifies new test data. For example, if we train a certain classifier on different kinds of fruits by providing some information like shape, color, taste and so on, given any new fruit with the following details it can predict what would be the exact or close match.
Similar concept is used here to classify handwritten digits, and here it is performed with Random Forest Classifier.
Random Forest Classifier -A random forest consist of combination of uncorrelated decision trees (Fig.1) and each works on various sub samples of the dataset. As each tree have different error instances averaging them all can provide better predictive accuracy and control over-fitting.
This tutorial can provide brief idea on the same
https://www.youtube.com/watch?v=loNcrMjYh64
MNIST Dataset consist of handwritten characters for training and testing. The .csv format of the same can be downloaded from Kaggle (Its an competition website for ML experts), just check the below link for more details.
https://www.kaggle.com/c/digit-recognizer/data
Here is the code for Random Forest Classifier (tested for Python 3.5)
Similar concept is used here to classify handwritten digits, and here it is performed with Random Forest Classifier.
Random Forest Classifier -A random forest consist of combination of uncorrelated decision trees (Fig.1) and each works on various sub samples of the dataset. As each tree have different error instances averaging them all can provide better predictive accuracy and control over-fitting.
This tutorial can provide brief idea on the same
https://www.youtube.com/watch?v=loNcrMjYh64
MNIST Dataset consist of handwritten characters for training and testing. The .csv format of the same can be downloaded from Kaggle (Its an competition website for ML experts), just check the below link for more details.
https://www.kaggle.com/c/digit-recognizer/data
Here is the code for Random Forest Classifier (tested for Python 3.5)
import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.cross_validation import cross_val_score from sklearn import metrics import time start_time = time.time() train_df = pd.read_csv('train.csv') X_tr = train_df.values[:, 1:].astype(float) Y_tr = train_df.values[:, 0] print ('training...') clf = RandomForestClassifier(100) clf = clf.fit(X_tr, Y_tr) print ('training complete...') scores = cross_val_score(clf, X_tr, Y_tr) print ('Accuracy {0}'.format(np.mean(scores))) # Read test datatest_df = pd.read_csv('test.csv') X_test = test_df.values.astype(float) # make predictionsY_test = clf.predict(X_test) # make DF to print easilyans = pd.DataFrame(data={'ImageId':range(1,len(Y_test)+1), 'Label':Y_test}) # save to csvans.to_csv('rf.csv', index=False) print("--- %s seconds ---" % (time.time() - start_time))
RESULT:
Accuracy 0.9622146922772131
--- 140.4269781112671 seconds ---
new year status
ReplyDelete