Custom metric based on tensorflow's streaming metrics returns NaN

I think the problem may come from the fact that the streaming metrics you use within your metric_fn do not get any update.

Try the following (I also included minor modifications to my taste):

def metric_fn(predictions=None, labels=None, weights=None):
    P, update_op1 = tf.contrib.metrics.streaming_precision(predictions, labels)
    R, update_op2 = tf.contrib.metrics.streaming_recall(predictions, labels)
    eps = 1e-5;
    return (2*(P*R)/(P+R+eps), tf.group(update_op1, update_op2))

tf.learn.MetricSpec __init__ first argument is metric_fn.

The documentation says:

metric_fn: A function to use as a metric. See _adapt_metric_fn for rules on how predictions, labels, and weights are passed to this function. This must return either a single Tensor, which is interpreted as a value of this metric, or a pair (value_op, update_op), where value_op is the op to call to obtain the value of the metric, and update_op should be run for each batch to update internal state.

Since you want to use streaming operations in your metric_fn, you can't return a single Tensor but you have to take into account that the streaming operations have an inner state that has to be updated.

Thus, the first part of your metric_fn should be:

def metric_fn(predictions=[], labels=[], weights=[]):
    P, update_precision = tf.contrib.metrics.streaming_precision(predictions, labels)
    R, update_recall = tf.contrib.metrics.streaming_recall(predictions, labels)

Then, if you want to return 0 when a condition is met you can't use python if statement (that's not computed insede the tensorflow graph) but you have to use tf.cond (computation inside the graph).

Moreover, you want to check the value of P and R only after the update operation (otherwise the first value is undefined or nan).

To force the evaluation of tf.cond after the P and R update, you can use tf.control_dependencies

def metric_fn(predictions=[], labels=[], weights=[]):
    P, update_precision = tf.contrib.metrics.streaming_precision(predictions, labels)
    R, update_recall = tf.contrib.metrics.streaming_recall(predictions, labels)

    with tf.control_dependencies([P, update_precision, R, update_recall]):
        score = tf.cond(tf.equal(P + R, 0.), lambda: 0, lambda: 2*(P*R)/(P+R))
    return score, tf.group(update_precision, update_recall)