How to pass base64 encoded image to Tensorflow prediction?

The TensorFlow model does not have to be trained on base64 data. Leave your training graph as is. However, when exporting the model, you'll need to export a model that can accept png or jpeg (or possibly raw, if it's small) data. Then, when you export the model, you'll need to be sure to use a name for the output that ends in _bytes. This signals to CloudML Engine that you will be sending base64 encoded data. Putting it all together would like something like this:

from tensorflow.contrib.saved_model.python.saved_model import utils

# Shape of [None] means we can have a batch of images.
image = tf.placeholder(shape=[None], dtype=tf.string)
# Decode the image.
decoded = tf.image.decode_jpeg(image, channels=3)
# Do the rest of the processing.
scores = build_model(decoded)

# The input name needs to have "_bytes" suffix.
inputs = {'image_bytes': image}
outputs = {'scores': scores}
utils.simple_save(session, export_dir, inputs, outputs)

The request you send will look something like this:

{"instances": [{"b64": "x0welkja..."}]}

If you just want an efficient way to send images to a model (and not necessarily base-64 encode it), I would suggest uploading your images(s) to Google Cloud Storage and then having your model read off GCS. This way, you are not limited by image size and you can take advantage of multi-part, multithreaded, resumable uploads etc. that the GCS API provides.

TensorFlow's tf.read_file will directly off GCS. Here's an example of a serving input_fn that will do this. Your request to CMLE would send it an image URL (gs://bucket/some/path/to/image.jpg)

def read_and_preprocess(filename, augment=False):
    # decode the image file starting from the filename
    # end up with pixel values that are in the -1, 1 range
    image_contents = tf.read_file(filename)
    image = tf.image.decode_jpeg(image_contents, channels=NUM_CHANNELS)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 0-1
    image = tf.expand_dims(image, 0) # resize_bilinear needs batches
    image = tf.image.resize_bilinear(image, [HEIGHT, WIDTH], align_corners=False)
    #image = tf.image.per_image_whitening(image)  # useful if mean not important
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0) # -1 to 1
    return image

def serving_input_fn():
    inputs = {'imageurl': tf.placeholder(tf.string, shape=())}
    filename = tf.squeeze(inputs['imageurl']) # make it a scalar
    image = read_and_preprocess(filename)
    # make the outer dimension unknown (and not 1)
    image = tf.placeholder_with_default(image, shape=[None, HEIGHT, WIDTH, NUM_CHANNELS])

    features = {'image' : image}
    return tf.estimator.export.ServingInputReceiver(features, inputs)

Your training code will train off actual images, just as in rhaertel80's suggestion above. See https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/08_image/flowersmodel/trainer/task.py#L27 for what the training/evaluation input functions would look like.


I was trying to use @Lak's answer (thanks Lak) to get online predictions for multiple instances in one json file, but kept getting the following error (I had two instances in my test json, hence the shape [2]):

input filename tensor must be scalar but had shape [2]

The problem is that ML engine apparently batches all the instances together and passes them to the serving inpur receiver function, but @Lak's sample code assumes the input is a single instance (it indeed works fine if you have a single instance in your json). I altered the code so that it can process a batch of inputs. I hope it will help someone:

def read_and_preprocess(filename):
    image_contents = tf.read_file(filename)
    image = tf.image.decode_image(image_contents, channels=NUM_CHANNELS)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 0-1
    return image

def serving_input_fn():
    inputs = {'imageurl': tf.placeholder(tf.string, shape=(None))}
    filename = inputs['imageurl']
    image = tf.map_fn(read_and_preprocess, filename, dtype=tf.float32)
    # make the outer dimension unknown (and not 1)
    image = tf.placeholder_with_default(image, shape=[None, HEIGHT, WIDTH, NUM_CHANNELS])

    features = {'image': image}
    return tf.estimator.export.ServingInputReceiver(features, inputs)

The key changes are that 1) you don't squeeze the input tensor (that would cause trouble in the special case when your json contains only one instance) and, 2) use tf.map_fn to apply the read_and_preprocess function to a batch of input image urls.