1. HDFS doesn't allow parallel write 2. HDFS use pipeline to write multiple replicas, so it doesn't take three times more time than a traditional file write 3. HDFS allow parallel read
2014-06-17 19:17 GMT+08:00 Vijaya Narayana Reddy Bhoomi Reddy < vijay.bhoomire...@gmail.com>: > Hi, > > I have a basic question regarding file writes and reads in HDFS. Is the > file write and read process a sequential activity or executed in parallel? > > For example, lets assume that there is a File File1 which constitutes of > three blocks B1, B2 and B3. > > 1. Will the write process write B2 only after B1 is complete and B3 only > after B2 is complete or for a large file with many blocks, can this happen > in parallel? In all the hadoop documentation, I read this to be a > sequential operation. Does that mean for a file of 1TB, it takes three > times more time than a traditional file write? (due to default replication > factor of 3) > 2. Is it similar in the case of read as well? > > Kindly someone please provide some clarity on this... > > Regards > Vijay > -- Best Wishes! Yours, Zesheng