Get CSV to Spark dataframe

for Pyspark, assuming that the first row of the csv file contains a header

spark = SparkSession.builder.appName('chosenName').getOrCreate()'fileNameWithPath', mode="DROPMALFORMED",inferSchema=True, header = True)

from pyspark.sql.types import StringType
from pyspark import SQLContext
sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
               .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

With more recent versions of Spark (as of, I believe, 1.4) this has become a lot easier. The expression gives you a DataFrameReader instance, with a .csv() method:

df ="/path/to/your.csv")

Note that you can also indicate that the csv file has a header by adding the keyword argument header=True to the .csv() call. A handful of other options are available, and described in the link above.