Increase memory available to PySpark at runtime

As far as i know it wouldn't be possible to change the spark.executor.memory at run time. The containers, on the datanodes, will be created even before the spark-context initializes.


Citing this, after 2.0.0 you don't have to use SparkContext, but SparkSession with conf method as below:

spark.conf.set("spark.executor.memory", "2g")

You could set spark.executor.memory when you start your pyspark-shell

pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g

I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.

Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.

from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")

source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html

p.s. if you need to close the SparkContext just use:

SparkContext.stop(sc)

and to double check the current settings that have been set you can use:

sc._conf.getAll()