Passing nullable columns as parameter to Spark SQL UDF

The issue is that null is not a valid value for scala Int (which is the backing value) while it is a valid value for String. Int is equivalent to java int primitive and must have a value. This means the udf can't be called when the value is null and therefore null remains.

There are two ways to solve this:

  1. Change the function to accept java.lang.Integer (which is an object and can be null)
  2. If you can't change the function, you can use when/otherwise to do something special in case of null. For example when(col("int col").isNull, someValue).otherwise(the original call)

A good explanation of this can be found here