Thanks Tariq, It really helped me to understand but just one another doubt
that if reading is not a parallel process then to ready a file of 100GB and
 hdfs block size is 128MB. It will take lot much to read the complete file
but it's not the scenerio in the real time. And second question is write
operations as well is sequential process ? And will every datanode have
their own data streamer which listen to data queue to get the packets and
create pipeline. So, can you kindly help me to get clear idea of hdfs read
and write operations.

Regards
Sidharth

On 08-Apr-2017 12:49 PM, "Mohammad Tariq" <donta...@gmail.com> wrote:

Hi Sidhart,

When you read data from HDFS using a framework, like MapReduce, blocks of a
HDFS file are read in parallel by multiple mappers created in that
particular program. Input splits to be precise.

On the other hand if you have a standalone java program then it's just a
single thread process and will read the data sequentially.


On Friday, April 7, 2017, Sidharth Kumar <sidharthkumar2...@gmail.com>
wrote:

> Thanks for your response . But I dint understand yet,if you don't mind can
> you tell me what do you mean by "*With Hadoop, the idea is to parallelize
> the readers (one per block for the mapper) with processing framework like
> MapReduce.*"
>
> And also how the concept of parallelize the readers will work with hdfs
>
> Thanks a lot in advance for your help.
>
>
> Regards
> Sidharth
>
> On 07-Apr-2017 1:04 PM, "Philippe Kernévez" <pkerne...@octo.com> wrote:
>
> Hi Sidharth,
>
> The reads are sequential.
> With Hadoop, the idea is to parallelize the readers (one per block for the
> mapper) with processing framework like MapReduce.
>
> Regards,
> Philippe
>
>
> On Thu, Apr 6, 2017 at 9:55 PM, Sidharth Kumar <
> sidharthkumar2...@gmail.com> wrote:
>
>> Hi Genies,
>>
>> I have a small doubt that hdfs read operation is parallel or sequential
>> process. Because from my understanding it should be parallel but if I read
>> "hadoop definitive guide 4" in anatomy of read it says "*Data is
>> streamed from the datanode back **to the client, which calls read()
>> repeatedly on the stream (step 4). When the end of the **block is
>> reached, DFSInputStream will close the connection to the datanode, then
>> find **the best datanode for the next block (step 5). This happens
>> transparently to the client, **which from its point of view is just
>> reading a continuous stream*."
>>
>> So can you kindly explain me how read operation will exactly happens.
>>
>>
>> Thanks for your help in advance
>>
>> Sidharth
>>
>>
>
>
> --
> Philippe Kernévez
>
>
>
> Directeur technique (Suisse),
> pkerne...@octo.com
> +41 79 888 33 32
>
> Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
> OCTO Technology http://www.octo.ch
>
>
>

-- 


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>

Reply via email to