How to convert Row to json in Spark 2 Scala

I need to read json input and produce json output. Most fields are handled individually, but a few json sub objects need to just be preserved.

When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.

But I need to take some sub json structures out to a string to use as a new field.

This can be done like this:

dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))

location.address is the path to the sub json object of the incoming json based dataframe. address_json is the column name of that object converted to a string version of the json.

to_json is implemented in Spark 2.1.

If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.


You can use getValuesMap to convert the row object to a Map and then convert it JSON:

import scala.util.parsing.json.JSONObject
import org.apache.spark.sql._

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    
val row = df.first()          // this is an example row object

def convertRowToJSON(row: Row): String = {
    val m = row.getValuesMap(row.schema.fieldNames)
    JSONObject(m).toString()
}

convertRowToJSON(row)
// res46: String = {"A" : 1, "B" : 2, "C" : 3}