Why does Spark report "java.net.URISyntaxException: Relative path in absolute URI" when working with DataFrames?

It's the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark's codebase that may soon be released as 2.0.2 or 2.1.0).

The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse that uses /// (triple slashes).

Start spark-shell with --conf argument as follows:

spark-shell --conf spark.sql.warehouse.dir=file:///c:/tmp/spark-warehouse

Or create a SparkSession in your Spark application using the new fluent builder pattern as follows:

import org.apache.spark.sql.SparkSession
SparkSession spark = SparkSession
  .builder()
  .config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")
  .getOrCreate()

Or create conf/spark-defaults.conf with the following content:

spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse

If you do want to fix it in code yet not touch exsiting code, can also pass it from system properties, such that the spark initializations which comes after won't change.

System.setProperty(
    "spark.sql.warehouse.dir", 
    s"file:///${System.getProperty("user.dir")}/spark-warehouse"
    .replaceAll("\\\\", "/")
)

Note, also this is using the current working dir, which can be replaced with "c:/tmp/", or any place you'd like the spark-warehouse dir.

Why does Spark report "java.net.URISyntaxException: Relative path in absolute URI" when working with DataFrames?

Tags:

Dataframe

Apache Spark

Apache Spark Sql

Related

Recent Posts