Convert timestamp to date in Spark dataframe

For pyspark:

Assume you have a field name: 'DateTime' that shows the date as a date and a time

Add a new field to your df that shows a 'DateOnly' column as follows:

 from pyspark.sql.functions  import date_format
    df.withColumn("DateOnly", date_format('DateTime', "yyyyMMdd")).show()

This will show a new column in the df called DateOnly- with the date in yyyymmdd form

you should be doing the following

>>> df_test2.withColumn('date_again', func.from_unixtime('timestamp').cast(DateType())).show()
|    date| timestamp|date_again|

and schema is

>>> df_test2.withColumn('date_again', func.from_unixtime('timestamp').cast(DateType())).printSchema()
 |-- date: string (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- date_again: date (nullable = true)

To convert a unix_timestamp column (called TIMESTMP) in a pyspark dataframe (df) -- to a Date type:

Below is a two step process (there may be a shorter way):

  • convert from UNIX timestamp to timestamp
  • convert from timestamp to Date

Initially the df.printShchema() shows: -- TIMESTMP: long (nullable = true)

use spark.SQL to implement the conversion as follows:


dfNew= spark.sql("""
                     SELECT *, cast(TIMESTMP as Timestamp) as newTIMESTMP 
                     FROM dfTbl d


the printSchema() will show:

-- newTIMESTMP: timestamp (nullable = true)

finally convert the type from timestamp to Date as follows:

from pyspark.sql.types import DateType
dfNew=dfNew.withColumn('actual_date', dfNew['newTIMESTMP'].cast(DateType()))


func.when(( | ( == '')) , '0')\

doesn't work because it is type inconsistent - the first clause returns string while the second clause returns bigint. As a result it will always return NULL if data is NOT NULL and not empty.

It is also obsolete - SQL functions are NULL and malformed format safe. There is no need for additional checks.

In [1]: spark.sql("SELECT unix_timestamp(NULL, 'yyyyMMdd')").show()
|unix_timestamp(CAST(NULL AS STRING), yyyyMMdd)|
|                                          null|

In [2]: spark.sql("SELECT unix_timestamp('', 'yyyyMMdd')").show()
|unix_timestamp(, yyyyMMdd)|
|                      null|

And you don't need intermediate step in Spark 2.2 or later:

from pyspark.sql.functions import to_date

to_date("date", "yyyyMMdd")