RE: Replication problem of HDFS

2007-09-13 Thread Dhruba Borthakur
: Wednesday, September 12, 2007 11:12 PM To: hadoop-user@lucene.apache.org Subject: Re: Replication problem of HDFS Thanks for your detail example and explanation. The problem what I met is, all split blocks stored in the same datanode, that is, (A1, A2, A3) stored in the same datanode in your example. My

Re: Replication problem of HDFS

2007-09-13 Thread Ted Dunning
This is a knonw behavior (a feature, even). When yu write on a datanode, it prefers to put the data on that node because it is local. To avoid this r un the put on a non-datanode. Or do the put with a higher replication and drop the replication after the put. Or use distcp if all of the data n

Re: Replication problem of HDFS

2007-09-12 Thread ChaoChun Liang
Thanks for your detail example and explanation. The problem what I met is, all split blocks stored in the same datanode, that is, (A1, A2, A3) stored in the same datanode in your example. My test case is putting (by "hadoop fs -put" command) a file about 1GB to HDFS with 4 datanodes, where the n

Re: Replication problem of HDFS

2007-09-10 Thread Ted Dunning
Your question is very hard to understand. The problem may be the names of the different kinds of server. There is one namenode and there are many datanodes. Each file is divided into one or more blocks. By default the block has a maximum size of 64MB. Each block from a file is stored on one o

RE: Replication problem of HDFS

2007-09-10 Thread ChaoChun Liang
html#Replication+Pipelining > > Is my understanding of the documentation correct? > > > -Original Message- > From: ChaoChun Liang [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 06, 2007 9:23 PM > To: hadoop-user@lucene.apache.org > Subject: RE: Replicati

RE: Replication problem of HDFS

2007-09-10 Thread ChaoChun Liang
we may use the HDFS, otherswise we may condiser the local file system for the map/reduce processing. ChaoChun -Original Message- From: ChaoChun Liang Sent: Thursday, September 6, 2007 10:23pm To: hadoop-user@lucene.apache.org Subject: RE: Replication problem of HDFS So, the upload proce

RE: Replication problem of HDFS

2007-09-10 Thread Dhruba Borthakur
Hi ChoaChun, Your explanation sounds right. Thanks, dhruba -Original Message- From: Earney, Billy C. [mailto:[EMAIL PROTECTED] Sent: Monday, September 10, 2007 10:44 AM To: hadoop-user@lucene.apache.org Subject: RE: Replication problem of HDFS ChoaChun, I'm new to hadoop, b

RE: Replication problem of HDFS

2007-09-10 Thread Earney, Billy C.
EMAIL PROTECTED] Sent: Thursday, September 06, 2007 9:23 PM To: hadoop-user@lucene.apache.org Subject: RE: Replication problem of HDFS So, the upload process(from local file system to HDFS) will store all blocks(split from the dataset, said M split blocks) into a single node(depend on which client yo

RE: Replication problem of HDFS

2007-09-07 Thread Stu Hood
>The client is only used to transfer files to/from Hadoop: it doesn't do any >long term storage. Thanks, Stu -Original Message- From: ChaoChun Liang Sent: Thursday, September 6, 2007 10:23pm To: hadoop-user@lucene.apache.org Subject: RE: Replication problem of HDFS So, the

RE: Replication problem of HDFS

2007-09-06 Thread ChaoChun Liang
So, the upload process(from local file system to HDFS) will store all blocks(split from the dataset, said M split blocks) into a single node(depend on which client you put), not to all datanodes. And the "replication" means to replicate to N clients(if replication=N) and each client owns a compl

RE: Replication problem of HDFS

2007-09-05 Thread Stu Hood
nks, Stu -Original Message- From: ChaoChun Liang Sent: Wednesday, September 5, 2007 9:26pm To: hadoop-user@lucene.apache.org Subject: RE: Replication problem of HDFS Yes, you are right. the namenode and datanode are in the same machine and upload data into HDFS in the same one in my en

RE: Replication problem of HDFS

2007-09-05 Thread ChaoChun Liang
Yes, you are right. the namenode and datanode are in the same machine and upload data into HDFS in the same one in my environment. I suppose the HDFS will distribute these blocks to all others datanode(according the HDFS reference), but it is not actually. >>Inthis case, the only replica of th

RE: Replication problem of HDFS

2007-09-05 Thread Dhruba Borthakur
Hi ChaoChun, I do not fully understand your problem. I am guessing that you are running a Datanode on the same machine as the Namenode. I am also guessing that you are using the Namenode machine as a client to upload a file into HDFS. In this case, the only replica of the file will reside on the D