I've diagramed the hadoop HDFS write path here:

http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html


On Tue, Oct 1, 2013 at 5:24 PM, Ravi Prakash <ravi...@ymail.com> wrote:

> Karim!
>
> Look at DFSOutputStream.java:DataStreamer
>
> HTH
> Ravi
>
>
>   ------------------------------
>  *From:* Karim Awara <karim.aw...@kaust.edu.sa>
> *To:* user <user@hadoop.apache.org>
> *Sent:* Thursday, September 26, 2013 7:51 AM
> *Subject:* Re: Uploading a file to HDFS
>
>
> Thanks for the reply. when the client caches 64KB of data on its own side,
> do you know which set of major java classes/files responsible for such
> action?
>
> --
> Best Regards,
> Karim Ahmed Awara
>
>
> On Thu, Sep 26, 2013 at 2:25 PM, Jitendra Yadav <
> jeetuyadav200...@gmail.com> wrote:
>
> Case 2:
>
> While selecting target DN in case of write operations, NN will always
> prefers first DN as same DN from where client  sending the data, in some
> cases NN ignore that DN when there is some disk space issues or some other
> health symptoms found,rest of things will same.
>
> Thanks
> Jitendra
>
>
> On Thu, Sep 26, 2013 at 4:15 PM, Shekhar Sharma <shekhar2...@gmail.com>wrote:
>
> Its not the namenode that does the reading or breaking of the file..
> When you run the command hadoop fs -put <input> <output>.....
> Here "hadoop" is a script file which is default client for hadoop..and
> when the client contacts the namenode for writing, then NN creates a
> block id and ask 3 dN to host the block ( replication factor to 3) and
> this information is sent to client.
>
> client caches 64KB of data on its own side and then pushes the data to
> the DN and then this data gets pushed through pipeline..and this
> process gets repeated till 64MB data is written and if the client
> wants to to write more then he will again contact NN via heart beat
> signal and this process continuess...
>
> Check how does writing happens in HDFS?
>
>
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Thu, Sep 26, 2013 at 3:41 PM, Karim Awara <karim.aw...@kaust.edu.sa>
> wrote:
> > Hi,
> >
> > I have a couple of questions about the process of uploading a large file
> (>
> > 10GB) to HDFS.
> >
> > To make sure my understanding is correct, assuming I have a cluster of N
> > machines.
> >
> > What happens in the following:
> >
> >
> > Case 1:
> >                 assuming i want to uppload a file (input.txt) of size K
> GBs
> > that resides on the local disk of machine 1 (which happens to be the
> > namenode only). if I am running the command  -put input.txt {some hdfs
> dir}
> > from the namenode (assuming it does not play the datanode role), then
> will
> > the namenode read the first 64MB in a temporary pipe and then transfers
> it
> > to one of the cluster datanodes once finished?  Or the namenode does not
> do
> > any reading of the file, but rather asks a certain datanode to read the
> 64MB
> > window from the file remotely?
> >
> >
> > Case 2:
> >              assume machine 1 is the namenode, but i run the -put command
> > from machine 3 (which is a datanode). who will start reading the file?
> >
> >
> >
> > --
> > Best Regards,
> > Karim Ahmed Awara
> >
> > ________________________________
> > This message and its contents, including attachments are intended solely
> for
> > the original recipient. If you are not the intended recipient or have
> > received this message in error, please notify me immediately and delete
> this
> > message from your computer system. Any unauthorized use or distribution
> is
> > prohibited. Please consider the environment before printing this email.
>
>
>
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Reply via email to