Re: Does HDFS read blocks simultaneously in multi-threaded way?

Jeff Hubbs Wed, 26 Jun 2019 12:45:35 -0700

I'm not sure if I get the point of so doing, though.

With replication set to the default of three, your only-1-GB file willget cut up into a mere 24 blocks spread among some number of your workernodes. The "multithreaded" comes in when the various worker nodes arereading these blocks at once.?? Your disk I/O is only going to be so fastno matter how many threads on a machine are trying to read from them;Hadoop gets you beyond that by parallelizing that I/O across multipleentire machines.

That being said, if all you're trafficking in are 1-GB data files, whyare you even messing with Hadoop?


On 6/26/19 1:44 PM, Arpit Agarwal wrote:

HDFS reads blocks sequentially. We can implement a multi-threaded block reader 
in theory.

On Jun 26, 2019, at 5:05 AM, Daegyu Han <hdg9...@gmail.com> wrote:

Hi all,

Assuming HDFS has a 1GB file input.dat and a block size of 128MB.

Can the user read multithreaded when reading the input.dat file?

In other words, is not the block being read sequentially, but reading
multiple blocks at the same time?

If not, is it difficult to implement a multi-threaded block read?

Best Regards,
Daegyu

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Does HDFS read blocks simultaneously in multi-threaded way?

Reply via email to