Accessing a file that is being written

The reason why the the file will not be accessible until the whole file is written and closed (option D) is because, in order to access a file, the request is first sent to the NameNode, to obtain metadata relating to the different blocks that compose the file. This metadata will be written by the NameNode only after it receives confirmation that all blocks of the file were written successfully.

Therefore, even though the blocks are available, the user can't see the file until the metadata is updated, which is done after all blocks are written.


Seems both D and C are true as detailed by Chaos and Ashrith, respectively. I documented their results at https://martin.atlassian.net/wiki/spaces/lestermartin/blog/2019/03/21/1172373509/are+partially-written+hdfs+files+accessible+not+exactly+but+much+more+yes+than+I+previously+thought when playing with a 7.5 GB file.

In a nutshell, yes, the exact file name is NOT present until completed... AND... yes, you can actually read the file up to the last block written iF you realize the filename is temporarily suffixed with ._COPYING_.


As soon as a file is created, it is visible in the filesystem namespace. Any content written to the file is not guaranteed to be visible, however:

Once more than a block's worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers. (From Hadoop Definitive Guide, Coherency Model).

So, I would go with Option C.

Also, take a look at this related question.

Tags:

Hadoop

Hdfs