.18.1 jobtracker deadlock

2008-12-17 Thread Sagar Naik
Hi, Found one Java-level deadlock: = SocketListener0-7: waiting to lock monitor 0x0845e1fc (object 0x54f95838, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 0 on 54311 IPC Server handler 0 on 54311: waiting to lock monitor 0x4d671064

HOD questions

2008-12-17 Thread Craig Macdonald
Hello, We have two HOD questions: (1) For our current Torque PBS setup, the number of nodes requested by HOD (-l nodes=X) corresponds to the number of CPUs allocated, however these nodes can be spread across various partially or empty nodes. Unfortunately, HOD does not appear to honour the

OOME only with large datasets

2008-12-17 Thread Philip
I've been trying to trouble shoot an OOME we've been having. When we run the job over a dataset that about 700GB (~9000 files) or larger we will get an OOME on the map jobs. However if we run the job over smaller set of the data then everything works out fine. So my question is: What changes in

Re: Does datanode acts as readonly in case of DiskFull ?

2008-12-17 Thread Raghu Angadi
Sagar Naik wrote: Hi , I would like to know what happens in case of DiskFull on a datanode Does the datanode acts as block server only ? Yes. I think so. Does it rejects anymore Block creation request OR Namenode does not list it for new blocks yes. NN will not allocate it any more

Datanode handling of single disk failure

2008-12-17 Thread Brian Bockelman
Hello all, I'd like to take the datanode's capability to handle multiple directories to a somewhat-extreme, and get feedback on how well this might work. We have a few large RAID servers (12 to 48 disks) which we'd like to transition to Hadoop. I'd like to mount each of the disks

Re: [video] visualization of the hadoop code history

2008-12-17 Thread Jeff Hammerbacher
Very cool stuff, but I don't see a reference anywhere to the author of the visualization, which seems like poor form for a marketing video. I apologize if I missed a reference somewhere. Michael Ogawa at UC Davis wrote the code to generate that visualization and open sourced it at

Re: [video] visualization of the hadoop code history

2008-12-17 Thread Stefan Groschupf
Very cool stuff, but I don't see a reference anywhere to the author of the visualization, which seems like poor form for a marketing video. I apologize if I missed a reference somewhere. Jeff, you missed it! It is the first text screen at the end of the video. It is actually a cool open

Re: [video] visualization of the hadoop code history

2008-12-17 Thread Stefan Groschupf
Owen O'Malley wrote: It is interesting, but it would be more interesting to track the authors of the patch rather than the committer. The two are rarely the same. Indeed. There was a period of over a year where I wrote hardly anything but committed almost everything. So I am vastly

Re: [video] visualization of the hadoop code history

2008-12-17 Thread Jeff Hammerbacher
Ha, that's what I get for my short attention span. Rad stuff, sorry for missing the recognition of the code_swarm utility. On Wed, Dec 17, 2008 at 12:38 PM, Stefan Groschupf s...@101tec.com wrote: Very cool stuff, but I don't see a reference anywhere to the author of the visualization, which

java.nio.channels.ClosedSelectorException

2008-12-17 Thread Brian Cho
Hi, I've set up a Hadoop cluster but have a problem where multiple datanodes and tasks stop responding. I first ran into the problem using 0.19.0, but I also see the problem at 0.18.2. Java version is 1.6.0_11. Looking at the logs, the first sign of trouble seems to be either

Warning on turning on ipv6 on your Hadoop clusters

2008-12-17 Thread Runping Qi
If you may have turned on ipv6 on your hadoop cluster, it may cause severe performance hit! When I ran the gridmix2 benchmark on a newly constructed cluster, it took 30% more time than the baseline time that was obtained on a similar cluster. I noticed that some task processes on some machines

DiskUsage ('du -sk') probably hangs Datanode

2008-12-17 Thread Sagar Naik
I see createBlockException and Abandoning block quite often When I check the datanode, they are running. I can browse file system from that datanode:50075 However, I also notice tht a du forked off from the DN. This 'du' run anywhere from 6mins to 30 mins. During this time no logs are

Re: DiskUsage ('du -sk') probably hangs Datanode

2008-12-17 Thread Brian Bockelman
Hey Sagar, If the 'du' is in the D state, then that probably means bad things for your hardware. I recommend looking in dmesg and /var/log/messages for anything interesting, as well as perform a hard-drive diagnostic test (may be as simple as a SMART tests) to see if there's an issue.

Re: Output.collect uses toString for custom key class. Is it possible to change this?

2008-12-17 Thread Aaron Kimball
NullWritable has a get() method that returns the singleton instance of the NullWritable. - Aaron On Tue, Dec 16, 2008 at 9:30 AM, David Coe david@chalklabs.net wrote: Owen O'Malley wrote: On Dec 16, 2008, at 9:14 AM, David Coe wrote: Does the SequenceFileOutputFormat work with

Re: DiskUsage ('du -sk') probably hangs Datanode

2008-12-17 Thread Sagar Naik
Brian Bockelman wrote: Hey Sagar, If the 'du' is in the D state, then that probably means bad things for your hardware. I recommend looking in dmesg and /var/log/messages for anything interesting, as well as perform a hard-drive diagnostic test (may be as simple as a SMART tests) to see if

Copy data between HDFS instances...

2008-12-17 Thread C G
Hi All: I am setting up 2 grids, each with its own HDFS. The grids are unaware of each other but exist on the same network. I'd like to copy data from one HDFS to the other. Is there a way to do this simply, or do I need to cobble together scripts to copy from HDFS on one side and pipe to a

Re: Copy data between HDFS instances...

2008-12-17 Thread lohit
try hadoop distcp more info here http://hadoop.apache.org/core/docs/current/distcp.html Documentation is for current release, but looking hadoop distcp should print out help message. Thanks, Lohit - Original Message From: C G parallel...@yahoo.com To: core-user@hadoop.apache.org

Re: HOD questions

2008-12-17 Thread Hemanth Yamijala
Craig, Hello, We have two HOD questions: (1) For our current Torque PBS setup, the number of nodes requested by HOD (-l nodes=X) corresponds to the number of CPUs allocated, however these nodes can be spread across various partially or empty nodes. Unfortunately, HOD does not appear to

Re: Output.collect uses toString for custom key class. Is it possible to change this?

2008-12-17 Thread Owen O'Malley
On Dec 16, 2008, at 9:30 AM, David Coe wrote: Since the SequenceFileOutputFormat doesn't like nulls, how would I use NullWritable? Obviously output.collect(key, null) isn't working. If I change it to output.collect(key, new IntWritable()) I get the result I want (plus an int that I