How to change a column position in a spark dataframe?

You can get the column names, reorder them however you want, and then use select on the original DataFrame to get a new one with this new order:

val columns: Array[String] = dataFrame.columns
val reorderedColumnNames: Array[String] = ??? // do the reordering you want
val result: DataFrame = dataFrame.select(reorderedColumnNames.head, reorderedColumnNames.tail: _*)

Like others have commented, I'm curious to know why would you do this as the order is not relevant when you can query the columns by their names.

Anyway, using a select should give the feeling the columns have moved in schema description:

val data = Seq(
  ("a",       "hello", 1),
  ("b",       "spark", 2)
)
.toDF("field1", "field2", "field3")

data
 .show()

data
 .select("field3", "field2", "field1")
 .show()

A tiny different version compare to @Tzach Zohar

val cols = df.columns.map(df(_)).reverse
val reversedColDF = df.select(cols:_*)

The spark-daria library has a reorderColumns method that makes it easy to reorder the columns in a DataFrame.

import com.github.mrpowers.spark.daria.sql.DataFrameExt._

val actualDF = sourceDF.reorderColumns(
  Seq("field1", "field3", "field2")
)

The reorderColumns method uses @Rockie Yang's solution under the hood.

If you want to get the column ordering of df1 to equal the column ordering of df2, something like this should work better than hardcoding all the columns:

df1.reorderColumns(df2.columns)

The spark-daria library also defines a sortColumns transformation to sort columns in ascending or descending order (if you don't want to specify all the column in a sequence).

import com.github.mrpowers.spark.daria.sql.transformations._

df.transform(sortColumns("asc"))

How to change a column position in a spark dataframe?

Tags:

Scala

Dataframe

Apache Spark

Apache Spark Sql

Related

Recent Posts