Web application that uses scikit-learn

I'm working on a Docker image that wraps predict and predictproba methods and expose them as a web api: https://github.com/hexacta/docker-sklearn-predict-http-api

You need to save your model:

from sklearn.externals import joblib
joblib.dump(clf, 'iris-svc.pkl')

create a Dockerfile:

FROM hexacta/sklearn-predict-http-api:latest
COPY iris-svc.pkl /usr/src/app/model.pkl

and run the container:

$ docker build -t iris-svc .
$ docker run -d -p 4000:8080 iris-svc

then you can make requests:

$ curl -H "Content-Type: application/json" -X POST -d '{"sepal length (cm)":4.4}' http://localhost:4000/predictproba
  [{"0":0.8284069169,"1":0.1077571623,"2":0.0638359208}]
$ curl -H "Content-Type: application/json" -X POST -d '[{"sepal length (cm)":4.4}, {"sepal length (cm)":15}]' http://localhost:4000/predict
  [0, 2]

You can follow the tutorial below to deploy your scikit-learn model in Azure ML and get the web service automatically generated:

Build and Deploy a Predictive Web App Using Python and Azure ML

or the combination of yHat + Heroku may also do the trick


If this is just for a demo, train your classifier offline, pickle the model and then use a simple python web framework such as flask or bottle to unpickle the model at server startup time and call the predict function in an HTTP request handler.

django is a feature complete framework hence is longer to learn than flask or bottle but it has a great documentation and a larger community.

heroku is a service to host your application in the cloud. It's possible to host flask applications on heroku, here is a simple template project + instructions to do so.

For "production" setups I would advise you not to use pickle but to write your own persistence layer for the machine learning model so as to have full control on the parameters your store and be more robust to library upgrades that might break the unpickling of old models.


While this is not a classifier, I have implemented a simple machine learning web service using the bottle framework and scikit-learn. Given a dataset in .csv format it returns 2D visualizations with respect to principal components analysis and linear discriminant analysis techniques.

More information and example data files can be found at: http://mindwriting.org/blog/?p=153

Here is the implementation: upload.html:

<form
 action="/plot" method="post"
 enctype="multipart/form-data"
>
Select a file: <input type="file" name="upload" />
<input type="submit" value="PCA & LDA" />
</form>

pca_lda_viz.py (modify host name and port number):

import matplotlib
matplotlib.use('Agg')

import matplotlib.pyplot as plt
import numpy as np
from cStringIO import StringIO

from bottle import route, run, request, static_file
import csv
from matplotlib.font_manager import FontProperties
import colorsys

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.lda import LDA

html = '''
<html>
    <body>
        <img src="data:image/png;base64,{}" />
    </body>
</html>
'''

 @route('/')
 def root():
     return static_file('upload.html', root='.')

 @route('/plot', method='POST')
    def plot():

       # Get the data
       upload = request.files.get('upload')
       mydata = list(csv.reader(upload.file, delimiter=','))

       x = [row[0:-1] for row in mydata[1:len(mydata)]]

       classes =  [row[len(row)-1] for row in mydata[1:len(mydata)]]
       labels = list(set(classes))
       labels.sort()

       classIndices = np.array([labels.index(myclass) for myclass in classes])

       X = np.array(x).astype('float')
       y = classIndices
       target_names = labels

       #Apply dimensionality reduction
       pca = PCA(n_components=2)
       X_r = pca.fit(X).transform(X)

       lda = LDA(n_components=2)
       X_r2 = lda.fit(X, y).transform(X)

        #Create 2D visualizations
       fig = plt.figure()
       ax=fig.add_subplot(1, 2, 1)
       bx=fig.add_subplot(1, 2, 2)

       fontP = FontProperties()
       fontP.set_size('small')

       colors = np.random.rand(len(labels),3)

       for  c,i, target_name in zip(colors,range(len(labels)), target_names):
           ax.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, 
                      label=target_name,cmap=plt.cm.coolwarm)
           ax.legend(loc='upper center', bbox_to_anchor=(1.05, -0.05),
                     fancybox=True,shadow=True, ncol=len(labels),prop=fontP)
           ax.set_title('PCA')
           ax.tick_params(axis='both', which='major', labelsize=6)

       for c,i, target_name in zip(colors,range(len(labels)), target_names):
           bx.scatter(X_r2[y == i, 0], X_r2[y == i, 1], c=c, 
                      label=target_name,cmap=plt.cm.coolwarm)
           bx.set_title('LDA');
           bx.tick_params(axis='both', which='major', labelsize=6)

       # Encode image to png in base64
       io = StringIO()
       fig.savefig(io, format='png')
       data = io.getvalue().encode('base64')

       return html.format(data)

run(host='mindwriting.org', port=8079, debug=True)