I'm not sure if I get the point of so doing, though.
With replication set to the default of three, your only-1-GB file will
get cut up into a mere 24 blocks spread among some number of your worker
nodes. The "multithreaded" comes in when the various worker nodes are
reading these blocks at onc
Correct. The blocks will be read sequentially.
> On Jun 26, 2019, at 10:51 AM, Daegyu Han wrote:
>
> Thank you for your response.
>
> Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data
> node,
> does the map task read these blocks sequentially when trying to read local
Thank you for your response.
Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data
node,
does the map task read these blocks sequentially when trying to read local
blocks?
2019년 6월 27일 (목) 02:45, Arpit Agarwal 님이 작성:
> HDFS reads blocks sequentially. We can implement a multi
HDFS reads blocks sequentially. We can implement a multi-threaded block reader
in theory.
> On Jun 26, 2019, at 5:05 AM, Daegyu Han wrote:
>
> Hi all,
>
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
>
> Can the user read multithreaded when reading the input.dat file?
>