Hadoop: How to unit test FileSystem

If you're using hadoop 2.0.0 and above - consider using a hadoop-minicluster

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-minicluster</artifactId>
    <version>2.5.0</version>
    <scope>test</scope>
</dependency>

With it, you can create a temporary hdfs on your local machine, and run your tests on it. A setUp method may look like this:

baseDir = Files.createTempDirectory("test_hdfs").toFile().getAbsoluteFile();
Configuration conf = new Configuration();
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath());
MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
hdfsCluster = builder.build();

String hdfsURI = "hdfs://localhost:"+ hdfsCluster.getNameNodePort() + "/";
DistributedFileSystem fileSystem = hdfsCluster.getFileSystem();

And in a tearDown method you should shut down your mini hdfs cluster, and remove temporary directory.

hdfsCluster.shutdown();
FileUtil.fullyDelete(baseDir);

Why not use a mocking framework like Mockito or PowerMock to mock your interations with the FileSystem? Your unit tests should not depend on an actual FileSystem, but should just be verifying behavior in your code in interacting with the FileSystem.


Take a look at the hadoop-test jar

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>0.20.205.0</version>
</dependency>

it has classes for setting up a MiniDFSCluster and MiniMRCluster so you can test without Hadoop


One possible way would be to use TemporaryFolder in Junit 4.7.

See.: http://www.infoq.com/news/2009/07/junit-4.7-rules or http://weblogs.java.net/blog/johnsmart/archive/2009/09/29/working-temporary-files-junit-47.