I'm not sure if I get the point of so doing, though.

With replication set to the default of three, your only-1-GB file will get cut up into a mere 24 blocks spread among some number of your worker nodes. The "multithreaded" comes in when the various worker nodes are reading these blocks at once.?? Your disk I/O is only going to be so fast no matter how many threads on a machine are trying to read from them; Hadoop gets you beyond that by parallelizing that I/O across multiple entire machines.

That being said, if all you're trafficking in are 1-GB data files, why are you even messing with Hadoop?

On 6/26/19 1:44 PM, Arpit Agarwal wrote:
HDFS reads blocks sequentially. We can implement a multi-threaded block reader 
in theory.


On Jun 26, 2019, at 5:05 AM, Daegyu Han <hdg9...@gmail.com> wrote:

Hi all,

Assuming HDFS has a 1GB file input.dat and a block size of 128MB.

Can the user read multithreaded when reading the input.dat file?

In other words, is not the block being read sequentially, but reading
multiple blocks at the same time?

If not, is it difficult to implement a multi-threaded block read?

Best Regards,
Daegyu

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to