Showing tables from specific database with Pyspark and Hive

sqlContext.sql("show tables in 3_db").show()

There are two possible ways to achieve this, but they differ a lot in terms of efficiency.


Using SQL

This is the most efficient approach:

spark_session = SparkSession.builder.getOrCreate()
spark_session.sql("show tables in db_name").show()

Using catalog.listTables()

The following is more inefficient compared to the previous approach, as it also loads tables' metadata:

spark_session = SparkSession.builder.getOrCreate()
spark_session.catalog.listTables("db_name")

Another possibility is to use the Catalog methods:

spark = SparkSession.builder.getOrCreate()
spark.catalog.listTables("3_db")

Just be aware that in PySpark this method returns a list and in Scala, it returns a DataFrame.