Random Forest Classifier - MNIST Database - Kaggle (Digit Recogniser)- Python Code

In Machine Learning, Classifiers learns from the training data, and models some decision making framework. With that knowledge it classifies new test data. For example, if we train a certain classifier on different kinds of fruits by providing some information like shape, color, taste and so on, given any new fruit with the following details it can predict what would be the exact or close match.

Similar concept is used here to classify handwritten digits, and here it is performed with Random Forest Classifier.

Random Forest Classiﬁer -A random forest consist of combination of uncorrelated decision trees (Fig.1) and each works on various sub samples of the dataset. As each tree have different error instances averaging them all can provide better predictive accuracy and control over-ﬁtting.

This tutorial can provide brief idea on the same

https://www.youtube.com/watch?v=loNcrMjYh64

MNIST Dataset consist of handwritten characters for training and testing. The .csv format of the same can be downloaded from Kaggle (Its an competition website for ML experts), just check the below link for more details.

https://www.kaggle.com/c/digit-recognizer/data

Here is the code for Random Forest Classifier (tested for Python 3.5)

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_score
from sklearn import metrics
import time
start_time = time.time()



train_df = pd.read_csv('train.csv')

X_tr = train_df.values[:, 1:].astype(float)
Y_tr = train_df.values[:, 0]

print ('training...')
clf = RandomForestClassifier(100)
clf = clf.fit(X_tr, Y_tr)
print ('training complete...')

scores = cross_val_score(clf, X_tr, Y_tr)
print ('Accuracy {0}'.format(np.mean(scores)))


# Read test datatest_df = pd.read_csv('test.csv')
X_test = test_df.values.astype(float)

# make predictionsY_test = clf.predict(X_test)
# make DF to print easilyans = pd.DataFrame(data={'ImageId':range(1,len(Y_test)+1), 'Label':Y_test})

# save to csvans.to_csv('rf.csv', index=False)

print("--- %s seconds ---" % (time.time() - start_time))

RESULT:

Accuracy 0.9622146922772131
--- 140.4269781112671 seconds ---

Comments

imt3jaDecember 16, 2019 at 10:21 PM
new year status
ReplyDelete
Replies

Add comment

Image Processing (Digital Halftoning, Compression, ML & DL Models)

Search This Blog

Random Forest Classifier - MNIST Database - Kaggle (Digit Recogniser)- Python Code

Labels

Comments

Post a Comment

Popular posts from this blog

Digital Half Toning - Ordered Dithering - MATLAB Code Bayer/ Ulichney

Image RGB to CMYK : Image Conversion - MATLAB Code (RGB to CMYK, CMYK to RGB)

Direct Binary Search Halftoning (DBS) - MATLAB Code (Efficient Direct Binary Search, Halftoning)