Cannot load main class from JAR file in Spark Submit

The --py-files flag is for additional python file dependencies used from your program; you can see here in SparkSubmit.scala it uses the so-called "primary argument", meaning first non-flag argument, to determine whether to do a "submit jarfile" mode or "submit python main" mode.

That's why you see it trying to load your "$entry_function" as a jarfile that doesn't exist, since it only assumes you're running Python if that primary argument ends with ".py", and otherwise defaults to assuming you have a .jar file.

Instead of using --py-files, just make your /home/full/path/to/file/python/my_python_file.py be the primary argument; then you can either do fancy python to take the "entry function" as a program argument, or you just call your entry function in your main function inside the python file itself.

Alternatively, you can still use --py-files and then create a new main .py file which calls your entry function, and then pass that main .py file as the primary argument instead.


When adding elements to --py-files use comma to separate them without leaving any space. Try this:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        --py-files /home/full/path/to/file/python/my_python_file.py,$entry_function,$confLocation