Introduction to Machine Learning

Machine Learning is a of way learning from examples and experience instead of hard coded predefined rules. This is useful because we can then make predictions on the basis of what has been learnt. Say, for example, we wanted to write an algorithm that could recognise handwritten digits. Without Machine Learning we would need to write numerous lines of code to define the many possible ways people might write any given digit. Instead I will show you how we can use a learning method in scikit-learn to quickly work out how to recognising handwritten digits. We'll then use this method to recognise new examples and predict what digits they are.

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm

 

We begin by importing the relevant modules. Pyplot will be used to visualise our digits, datasets contains several examples of raw data including our digit recognition data. Most importantly of all svm is Support Vector Machine one of scikit-learn's learning methods, and it will perform the machine learning task in this example.

digits = datasets.load_digits()
print(digits.data)
print(digits.target)

[[  0.   0.   5. ...,   0.   0.   0.]
[  0.   0.   0. ...,  10.   0.   0.]
[  0.   0.   0. ...,  16.   9.   0.]
...,
[  0.   0.   1. ...,   6.   0.   0.]
[  0.   0.   2. ...,  12.   0.   0.]
[  0.   0.  10. ...,  12.   1.   0.]]
[0 1 2 ..., 8 9 8]

 

Next we load the digit recognition dataset, and print out the features and labels for our dataset. The features describe each our digits, and the labels 0 to 9 are the specific digits a set of features represents.

clf = svm.SVC()
X,y = digits.data[:-10], digits.target[:-10]
clf.fit(X,y)

 

As mentioned at the start if we didn't use machine learning we would need to write a lot of code to cover all the permutations of different handwritten digits. But our classifier, just 3 lines of code, saves us this hassle. We select the SVC classifier which can be fine tuned for improved results, but for this simple example we'll use it's default settings. To train our classifier we load all but the last 10 examples in our dataset into X and y. They are then trained with clf.fit(X,y)

print(clf.predict(digits.data[-10:-8]))

[5 4]

 

Now we can test it out. The 10th and 9th from last handwritten digits in our dataset are analysed using the classifier. It predicts that they are a 5 and 4. Let's see if that's correct...

for x in range (-10, -8):
    plt.imshow(digits.images[x], cmap=plt.cm.gray_r, interpolation='nearest')
    plt.show()

digits.png

Indeed they are a 5 and 4, the prediction was correct! While this is a straightforward example, I hope it gives you an idea of the potential of Machine Learning.