Connect sparklyr to remote spark connection

As of sparklyr version 0.4, it is unsupported to connect from the RStudio desktop to a remote Spark cluster. Instead, as you mention, the recommended approach is to install RStudio Server within the Spark cluster.

That said, the livy branch in sparklyr is exploring integration with Livy that would enable the RStudio desktop to connect to a remote Spark cluster through Livy.


Using more recent version of sparklyr (version 0.9.2 for example) it's possible to connect to a remote Spark cluster.

Here is an example to connect to a Spark standalone cluster version 2.3.1. See Master URLs for other master URL schemes.

#install.packages("sparklyr")
library(sparklyr)

# You have to install locally (on the driver where RStudio is running) the same Spark version
spark_v <- "2.3.1"
cat("Installing Spark in the directory:", spark_install_dir())
spark_install(version = spark_v)

sc <- spark_connect(spark_home = spark_install_find(version=spark_v)$sparkVersionDir, 
                    master = "spark://ip-[MY_PRIVATE_IP]:7077")

sc$master
# "spark://ip-[MY_PRIVATE_IP]:7077"

I've written a post on this topic.