Scala Spark connect to remote cluster

There are two things missing:

  • The cluster manager should be set to yarn (setMaster("yarn")) and the deploy-mode to cluster, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties
  • Also, you need to get yarn-site.xml and core-site.xml files from the cluster and put them in HADOOP_CONF_DIR, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html

By the way, this would work if you use spark-submit to submit a job, programatically it's more complex to achieve it and could only use yarn-client mode which is tricky to setup remotely.