Re: Streaming data access in HDFS: Design Feature

Nitin Pawar Wed, 05 Mar 2014 01:07:25 -0800

Hadoop streaming  allows you to create and run Map/Reduce jobs with any
executable or script as the mapper and/or the reducer. In other words, you
need not need to learn java programming for writing simple mapreduce
program.


Where as streaming data access in HDFS is totally different. When mapreduce
framework tries to read/write data from/to hdfs blocks, its done by byte
streams. Bytes are always appended to the end of a stream, and byte streams
are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you
want to understand more of it then you can look at the codebase for any
fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===


On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <radhe.krishna.ra...@live.com>wrote:

> Hi Nitin,
>
> I believe *Hadoop Streaming* is different from *Streaming Data Access* in
> HDFS.
>
> We usually copy the data in HDFS and then the MR application reads the
> data through Map and Reduce tasks.
> I need to clear about WHAT and HOW is done in *Streaming Data Access* in
> HDFS.
>
> Thanks,
> RR
>
>
> ------------------------------
> Date: Wed, 5 Mar 2014 14:17:24 +0530
>
> Subject: Re: Streaming data access in HDFS: Design Feature
> From: nitinpawar...@gmail.com
> To: user@hadoop.apache.org
>
>
> are you asking "why data read/write from/to hdfs blocks via mapreduce
> framework  is done in streaming manner?"
>
>
> On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe 
> <radhe.krishna.ra...@live.com>wrote:
>
> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishash...@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.ra...@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: 
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <shrip...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe 
> <radhe.krishna.ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Reply via email to