Clone/Deep-Copy a Spark DataFrame

Dataframes are immutable. That means you don't have to do deep-copies, you can reuse them multiple times and on every operation new dataframe will be created and original will stay unmodified.

For example:

val df = List((1),(2),(3)).toDF("id")

val df1 = df.as("df1") //second dataframe
val df2 = df.as("df2") //third dataframe

df1.join(df2, $"df1.id" === $"df2.id") //fourth dataframe and df is still unmodified

It seems like a waste of resources, but since all data in dataframe is also immutable, then all four dataframes can reuse references to objects inside them.