How to convert the datasets of Spark Row into string?

You can use the map function to convert every row into a string, e.g.:

df.map(row => row.mkString())

Instead of just mkString you can of course do more sophisticated work

The collect method then can retreive the whole thing into an array

val strings = df.map(row => row.mkString()).collect

(This is the Scala syntax, I think in Java it's quite similar)


Here is the sample code in Java.

public class SparkSample {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
            .builder()
            .appName("SparkSample")
            .master("local[*]")
            .getOrCreate();
    //create df
    List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
    Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
    df.show();
    //using df.as
    List<String> listOne = df.as(Encoders.STRING()).collectAsList();
    System.out.println(listOne);
    //using df.map
    List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
    System.out.println(listTwo);
  }
}

"row" is java 8 lambda parameter. Please check developer.com/java/start-using-java-lambda-expressions.html