How to create an empty DataFrame? Why "ValueError: RDD is empty"?

This will work with spark version 2.0.0 or more

from pyspark.sql import SQLContext
sc = spark.sparkContext
schema = StructType([StructField('col1', StringType(), False),StructField('col2', IntegerType(), True)])
sqlContext.createDataFrame(sc.emptyRDD(), schema)

spark.range(0).drop("id")

This creates a DataFrame with an "id" column and no rows then drops the "id" column, leaving you with a truly empty DataFrame.


At the time this answer was written it looks like you need some sort of schema

from pyspark.sql.types import *
field = [StructField("field1", StringType(), True)]
schema = StructType(field)

sc = spark.sparkContext
sqlContext.createDataFrame(sc.emptyRDD(), schema)

extending Joe Widen's answer, you can actually create the schema with no fields like so:

schema = StructType([])

so when you create the DataFrame using that as your schema, you'll end up with a DataFrame[].

>>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
DataFrame[]
>>> empty.schema
StructType(List())

In Scala, if you choose to use sqlContext.emptyDataFrame and check out the schema, it will return StructType().

scala> val empty = sqlContext.emptyDataFrame
empty: org.apache.spark.sql.DataFrame = []

scala> empty.schema
res2: org.apache.spark.sql.types.StructType = StructType()