How to CREATE TABLE USING delta with Spark 2.4.4?

tl;dr CREATE TABLE USING delta is not supported by Spark before 3.0.0 and Delta Lake before 0.7.0.


Delta Lake 0.7.0 with Spark 3.0.0 (both just released) do support CREATE TABLE SQL command.

Be sure to "install" Delta SQL using spark.sql.catalog.spark_catalog configuration property with org.apache.spark.sql.delta.catalog.DeltaCatalog.

$ ./bin/spark-submit \
  --packages io.delta:delta-core_2.12:0.7.0 \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

scala> spark.version
res0: String = 3.0.0

scala> sql("CREATE TABLE delta_101 (id LONG) USING delta").show
++
||
++
++

scala> spark.table("delta_101").show
+---+
| id|
+---+
+---+

scala> sql("DESCRIBE EXTENDED delta_101").show(truncate = false)
+----------------------------+---------------------------------------------------------+-------+
|col_name                    |data_type                                                |comment|
+----------------------------+---------------------------------------------------------+-------+
|id                          |bigint                                                   |       |
|                            |                                                         |       |
|# Partitioning              |                                                         |       |
|Not partitioned             |                                                         |       |
|                            |                                                         |       |
|# Detailed Table Information|                                                         |       |
|Name                        |default.delta_101                                        |       |
|Location                    |file:/Users/jacek/dev/oss/spark/spark-warehouse/delta_101|       |
|Provider                    |delta                                                    |       |
|Table Properties            |[]                                                       |       |
+----------------------------+---------------------------------------------------------+-------+

The OSS version of Delta does not have the SQL Create Table syntax as of yet. This will be implemented the future versions using Spark 3.0.

To create a Delta table, you must write out a DataFrame in Delta format. An example in Python being

df.write.format("delta").save("/some/data/path")

Here's a link to the create table documentation for Python, Scala, and Java.


An example with pyspark 3.0.0 & delta 0.7.0

print(f"LOCATION '{location}")
spark.sql(f"""
CREATE OR REPLACE TABLE  {TABLE_NAME} (
  CD_DEVICE INT, 
  FC_LOCAL_TIME TIMESTAMP,  
  CD_TYPE_DEVICE STRING,
  CONSUMTION DOUBLE,
  YEAR INT,
  MONTH INT, 
  DAY INT )
USING DELTA
PARTITIONED BY (YEAR , MONTH , DAY, FC_LOCAL_TIME)
LOCATION '{location}'
""")

Where "location" is a dir HDFS for spark cluster mode save de delta table.