Creating a new Spark DataFrame with new column value based on column in first dataframe Java

I believe you can use when to achieve that. Additionally, you probably can replace the old column directly. For your example, the code would be something like:

import static org.apache.spark.sql.functions.*;

Column newCol = when(col("C").equalTo("A"), "X")
    .when(col("C").equalTo("B"), "Y")
    .otherwise("Z");

DataFrame df2 = df1.withColumn("C", newCol);

For more details about when, check the Column Javadoc.


Thanks to Daniel I have resolved this :)

The missing piece was the static import of the sql functions

import static org.apache.spark.sql.functions.*;

I must have tried a million different ways of using when, but got compile failures/runtime errors because I didn't do the import. Once imported Daniel's answer was spot on!


You may also use udf's to do the same job. Just write a simple if then else structure

import org.apache.spark.sql.functions.udf
val customFunct = udf { d =>
      //if then else construct
    }

val new_DF= df.withColumn(column_name, customFunct(df("data_column")))