RE: More Replication on dfs

2009-04-09 Thread Puri, Aseem
Hi I also tried the command $ bin/hadoop balancer. But still the same problem. Aseem -Original Message- From: Puri, Aseem [mailto:aseem.p...@honeywell.com] Sent: Friday, April 10, 2009 11:18 AM To: core-user@hadoop.apache.org Subject: RE: More Replication on dfs Hi Alex,

RE: More Replication on dfs

2009-04-09 Thread Puri, Aseem
Hi Alex, Thanks for sharing your knowledge. Till now I have three machines and I have to check the behavior of Hadoop so I want replication factor should be 2. I started my Hadoop server with replication factor 3. After that I upload 3 files to implement word count program. But as my all f

Re: Can we somehow read from the HDFS without converting it to local?

2009-04-09 Thread Ravi Phulari
Try using Hadoop fs -cat E.g Hadoop fs -cat http://hadoop.apache.org/core/docs/r0.19.1/hdfs_shell.html#cat -- Ravi On 4/9/09 8:56 PM, "Sid123" wrote: I need to reuse the O/P of my DFS file without copying to local. Is there a way? -- View this message in context: http://www.nabble.com/C

Re: More Replication on dfs

2009-04-09 Thread Mithila Nagendra
To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard wrote: > Did you load any files when repl

Re: More Replication on dfs

2009-04-09 Thread Alex Loddengaard
Did you load any files when replication was set to 3? If so, you'll have to rebalance: Note that most people run HDFS with a replication factor

More Replication on dfs

2009-04-09 Thread Puri, Aseem
Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is it happening? Regards, Aseem Puri

Re: HDFS read/write speeds, and read optimization

2009-04-09 Thread Brian Bockelman
On Apr 9, 2009, at 5:45 PM, Stas Oskin wrote: Hi. I have 2 questions about HDFS performance: 1) How fast are the read and write operations over network, in Mbps per second? Depends. What hardware? How much hardware? Is the cluster under load? What does your I/O load look like? As

Re: HDFS as a logfile ??

2009-04-09 Thread Brian Bockelman
Also, Chukwa (a project already in Hadoop contrib) is designed to do something similar with Hadoop directly: http://wiki.apache.org/hadoop/Chukwa I think some of the examples even mention Apache logs. Haven't used it personally, but it looks nice. Brian On Apr 9, 2009, at 11:14 PM, Alex

Re: HDFS as a logfile ??

2009-04-09 Thread Alex Loddengaard
This is a great idea and a common application, Ricky. Scribe is probably useful for you as well: < http://images.google.com/imgres?imgurl=http://farm3.static.flickr.com/2211/2197670659_b42810b8ba.jpg&imgrefurl=http://www.flickr.com/photos/niallkenne

Re: HDFS read/write speeds, and read optimization

2009-04-09 Thread Alex Loddengaard
Answers in-line. Alex On Thu, Apr 9, 2009 at 3:45 PM, Stas Oskin wrote: > Hi. > > I have 2 questions about HDFS performance: > > 1) How fast are the read and write operations over network, in Mbps per > second? Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: < http://

HDFS as a logfile ??

2009-04-09 Thread Ricky Ho
I want to analyze the traffic pattern and statistics of a distributed application. I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel. Is this a good approach ? In this case, does HDFS sup

Can we somehow read from the HDFS without converting it to local?

2009-04-09 Thread Sid123
I need to reuse the O/P of my DFS file without copying to local. Is there a way? -- View this message in context: http://www.nabble.com/Can-we-somehow-read-from-the-HDFS-without-converting-it-to-local--tp22982760p22982760.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

HDFS read/write speeds, and read optimization

2009-04-09 Thread Stas Oskin
Hi. I have 2 questions about HDFS performance: 1) How fast are the read and write operations over network, in Mbps per second? 2) If the chunk server is located on same host as the client, is there any optimization in read operations? For example, Kosmos FS describe the following functionality:

Re: [Interesting] One reducer randomly hangs on getting 0 mapper output

2009-04-09 Thread Steve Gao
I am using 0.17.0 . I think the problem is basically because reducer falls in a infinite loop to get mapper output, when mapper is somehow not available/dead . Doesn't hadoop have a solution? --- On Thu, 4/9/09, Steve Gao wrote: From: Steve Gao Subject: [Interesting] One reducer randomly han

[Interesting] One reducer randomly hangs on getting 0 mapper output

2009-04-09 Thread Steve Gao
I have hadoop jobs with the last 1 reducer randomly hangs on getting 0 mapper output. By randomly I mean the job sometimes works correctly, sometimes their last 1 reducer keeps reading map output but always gets 0 data. It would hang up to 100 hours for getting 0 data until I kill it. After I k

Re: reduce task specific jvm arg

2009-04-09 Thread Philip Zeyliger
There doesn't seem to be. The command line for the JVM is computed in org.apache.hadoop.mapred.TaskRunner#run(). On Thu, Apr 9, 2009 at 10:30 AM, Jun Rao wrote: > Hi, > > Is there a way to set jvm parameters only for reduce tasks in Hadoop? > Thanks, > > Jun > IBM Almaden Research Center > K55/

reduce task specific jvm arg

2009-04-09 Thread Jun Rao
Hi, Is there a way to set jvm parameters only for reduce tasks in Hadoop? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com

Re: Issue distcp'ing from 0.19.2 to 0.18.3

2009-04-09 Thread Bryan Duxbury
Ah, nevermind. It turns out that I just shouldn't rely on command history so much. I accidentally pointed the hftp:// at the actual namenode port, not the namenode HTTP port. It appears to be starting a regular copy again. -Bryan On Apr 8, 2009, at 11:57 PM, Todd Lipcon wrote: Hey Bryan,

RE: Issue distcp'ing from 0.19.2 to 0.18.3

2009-04-09 Thread Koji Noguchi
Bryan, hftp://ds-nn1:7276 hdfs://ds-nn2:7276 Are you using the same port number for hftp and hdfs? Looking at the stack trace, it seems like it failed before starting a distcp job. Koji -Original Message- From: Bryan Duxbury [mailto:br...@rapleaf.com] Sent: Wednesday, April 08, 2009 1

[ANNOUNCE] Katta 0.5 released

2009-04-09 Thread Stefan Groschupf
(...apologies for the cross posting...) Release 0.5 of Katta is now available. Katta - Lucene in the cloud. http://katta.sourceforge.net This release fixes bugs from 0.4, including one that sorted the results wrong under load. 0.5 also upgrades to Zookeeper to version 3.1., Lucene to version