are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <>wrote:

> Hi Shashwat,
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
> I think this is not what I am asking for.
> Thanks.
> -RR
> ------------------------------
> From:
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To:
> CC:
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
> i hope you have already read this :
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: 
>] <>[image:
>] <>[image:
> <>[image:
>] <>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe 
> <>wrote:
> Hello All,
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
> Thanks,
> RR

Nitin Pawar

Reply via email to