Cannot Read a file from HDFS using Spark

This will work:

val textFile = sc.textFile("hdfs://localhost:9000/user/input.txt")

Here, you can take localhost:9000 from hadoop core-site.xml config file's fs.defaultFS parameter value.


Here is the solution

sc.textFile("hdfs://nn1home:8020/input/war-and-peace.txt")

How did I find out nn1home:8020?

Just search for the file core-site.xml and look for xml element fs.defaultFS


You are not passing a proper url string.

  • hdfs:// - protocol type
  • localhost - ip address(may be different for you eg. - 127.56.78.4)
  • 54310 - port number
  • /input/war-and-peace.txt - Complete path to the file you want to load.

Finally the URL should be like this

hdfs://localhost:54310/input/war-and-peace.txt

if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.."

If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt")

That's only one /