In Apache Spark. How to set worker/executor's environment variables?

Just stumbled upon something in the Spark documentation:

spark.executorEnv.[EnvironmentVariableName]

Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.

So in your case, I'd set the Spark configuration option spark.executorEnv.com.amazonaws.sdk.disableCertChecking to true and see if that helps.


Adding more to the existing answer.

import pyspark


def get_spark_context(app_name):
    # configure
    conf = pyspark.SparkConf()
    conf.set('spark.app.name', app_name)

    # init & return
    sc = pyspark.SparkContext.getOrCreate(conf=conf)

    # Configure your application specific setting

    # Set environment value for the executors
    conf.set(f'spark.executorEnv.SOME_ENVIRONMENT_VALUE', 'I_AM_PRESENT')

    return pyspark.SQLContext(sparkContext=sc)

SOME_ENVIRONMENT_VALUE environment variable will be available in the executors/workers.

In your spark application, you can access them like this:

import os
some_environment_value = os.environ.get('SOME_ENVIRONMENT_VALUE')