SparkSQL - Read parquet file directly

After creating a Dataframe from parquet file, you have to register it as a temp table to run sql queries on it.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val df ="src/main/resources/peopleTwo.parquet")


// after registering as a table you will be able to run sql queries

sqlContext.sql("select * from people").collect.foreach(println)

With plain SQL

JSON, ORC, Parquet, and CSV files can be queried without creating the table on Spark DataFrame.

//This Spark 2.x code you can do the same on sqlContext as well
val spark: SparkSession = SparkSession.builder.master("set_the_master").getOrCreate

spark.sql("select col_A, col_B from parquet.`hdfs://my_hdfs_path/my_db.db/my_table`")