Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Edward Capriolo
I will give this a go. I have actually went in JMX and manually triggered GC no memory is returned. So I assumed something was leaking. On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris afa...@linkedin.com wrote: I know this will sound odd, but try reducing your heap size. We had an issue like

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Michael Segel
Hey Silly question... How long have you had 27 million files? I mean can you correlate the number of files to the spat of OOMs? Even without problems... I'd say it would be a good idea to upgrade due to the probability of a lot of code fixes... If you're running anything pre 1.x, going to

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Joep Rottinghuis
Do your OOMs correlate with the secondary checkpointing? Joep Sent from my iPhone On Dec 22, 2012, at 7:42 AM, Michael Segel michael_se...@hotmail.com wrote: Hey Silly question... How long have you had 27 million files? I mean can you correlate the number of files to the spat of OOMs?

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Edward Capriolo
Newer 1.6 are getting close to 1.7 so I am not going to fear a number and fight the future. I have been aat around 27 million files for a while been as high as 30 million I do not think that is related. I do not think it is related to checkpoints but I am considering raising/lowering the

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Edward Capriolo
Blocks is ~26,000,000 Files is a bit higher ~27,000,000 Currently running: [root@hnn217 ~]# java -version java version 1.7.0_09 Was running 1.6.0_23 export JVM_OPTIONS=-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Edward Capriolo
Ok so here is the latest. http://imagebin.org/240392 I took a jmap on startup and one an hour after. http://pastebin.com/xEkWid4f I think the biggest deal is [B which may not be very helpful num #instances #bytes class name -- 1:

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Suresh Srinivas
This looks to me is because of larger default young generation size in newer java releases - see http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html#heap_size. I can see looking at your GC logs, around 6G space being used for young generation (though I do not see logs related to

Re: Merging files

2012-12-22 Thread Harsh J
Yes, via the simple act of opening a target stream and writing all source streams into it. Or to save code time, an identity job with a single reducer (you may not get control over ordering this way). On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is it possible to

Re: Hadoop example command.

2012-12-22 Thread Ramachandran Vilayannur
Hi Rishi, Thanks for looking into this... I am just trying to set things up to begin playing around and i got the command from http://hadoop.apache.org/docs/r0.18.3/quickstart.html When i run it, the command simply returns silently no messages on the console...if you could point me to how to

distributed cache

2012-12-22 Thread Lin Ma
Hi guys, I want to confirm when on each task node either mapper or reducer access distributed cache file, it resides on disk, not resides in memory. Just want to make sure distributed cache file does not fully loaded into memory which compete memory consumption with mapper/reducer tasks. Is that

Re: distributed cache

2012-12-22 Thread Kai Voigt
Hi, Am 22.12.2012 um 13:03 schrieb Lin Ma lin...@gmail.com: I want to confirm when on each task node either mapper or reducer access distributed cache file, it resides on disk, not resides in memory. Just want to make sure distributed cache file does not fully loaded into memory which

Re: distributed cache

2012-12-22 Thread Lin Ma
Hi Kai, Smart answer! :-) - The assumption you have is one distributed cache replica could only serve one download session for tasktracker node (this is why you get concurrency n/r). The question is, why one distributed cache replica cannot serve multiple concurrent download session?

reducer tasks start time issue

2012-12-22 Thread Lin Ma
Hi guys, Supposing in a Hadoop job, there are both mappers and reducers. My question is, reducer tasks cannot begin until all mapper tasks complete? If so, why designed in this way? thanks in advance, Lin

Re: Child processes on datanodes/task trackers

2012-12-22 Thread Harsh J
This is an interesting stacktrace. The JVM had called exit but there are threads stuck. Unsure if your JVM version (6u22) has anything to do with it. Are you using the JVM reuse feature? On Sat, Dec 22, 2012 at 3:46 PM, Sedighe Tabatabaei tabatabaei...@gmail.com wrote: Hello I wanted to know

Re: How to troubleshoot OutOfMemoryError

2012-12-22 Thread Stephen Fritz
Troubleshooting OOMs in the map/reduce tasks can be tricky, see page 118 of Hadoop

Re: How to troubleshoot OutOfMemoryError

2012-12-22 Thread Manoj Babu
David, I faced the same issue due to too much of logging that fills the task tracker log folder. Cheers! Manoj. On Sat, Dec 22, 2012 at 9:10 PM, Stephen Fritz steph...@cloudera.comwrote: Troubleshooting OOMs in the map/reduce tasks can be tricky, see page 118 of Hadoop

Re: What should I do with a 48-node cluster

2012-12-22 Thread Michael Segel
Uhm... not exactly. Power consumption is only part of it. ;-) Power consumption by itself is not enough to establish probable cause. If that were the case, there would be a lot of raids around the Xmas holidays ... ;-) Now don't ask me how I know this ;-P Having run a rack out of my

Re: reducer tasks start time issue

2012-12-22 Thread Harsh J
A reduce can't process the complete data set until it has fetched all partitions. And any map may produce a partition for any reducer. Hence, we generally wait before all maps have terminated, and their partition outputs ready and copied over to reduces, before we begin to group and process the

Re: Merging files

2012-12-22 Thread Ted Dunning
The technical term for this is copying. You may have heard of it. It is a subject of such long technical standing that many do not consider it worthy of detailed documentation. Distcp effects a similar process and can be modified to combine the input files into a single file.

Re: Alerting

2012-12-22 Thread Mohit Anchlia
Best I can find is hadoop job list so far On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's the best way to trigger alert when jobs run for too long or have many failures? Is there a hadoop command that can be used to perform this activity?

Re: Alerting

2012-12-22 Thread Mohammad Tariq
MR web UI?Although we can't trigger anything, it provides all the info related to the jobs. I mean it would be easier to just go there and and have a look at everything rather than opening the shell and typing the command. I'm a bit lazy ;) Best Regards, Tariq +91-9741563634

Re: Merging files

2012-12-22 Thread Ted Dunning
A pig script should work quite well. I also note that the file paths have maprfs in them. This implies that you are using MapR and could simply use the normal linux command cat to concatenate the files if you mount the files using NFS (depending on volume, of course). For small amounts of data,

Re: Alerting

2012-12-22 Thread Ted Dunning
You can write a script to parse the Hadoop job list and send an alert. The trick of putting a retry into your workflow system is a nice one. If your program won't allow multiple copies to run at the same time, then if you re-invoke the program every, say, hour, then 5 retries implies that the

Re: Alerting

2012-12-22 Thread Ted Dunning
Also, I think that Oozie allows for timeouts in job submission. That might answer your need. On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning tdunn...@maprtech.com wrote: You can write a script to parse the Hadoop job list and send an alert. The trick of putting a retry into your workflow

Re: What should I do with a 48-node cluster

2012-12-22 Thread Mark Kerzner
Jay, my nature is Houston, TX, they will argue about who can cool whom :) I am thinking of $500/month rack hosting. Mark On Sat, Dec 22, 2012 at 2:12 PM, Jay ss...@yahoo.com wrote: You can let nature cool it. I have mine behind the garage

Re: What should I do with a 48-node cluster

2012-12-22 Thread Edward Capriolo
You do not absolutely need more ram. You do not know your workload yet. A standard hadoop machine has 8 disks 16 GB RAM, 8 cores. In the old days, you would dedicate map slots and reduce slots map 3 map 1 reduce in your case. Give each of them 256 RAM for child jvm ops. So you needed more ram in

Re: Merging files

2012-12-22 Thread Mohit Anchlia
Thanks for the info. I was trying not to use nfs because my data size might be 10-20GB in size for every merge I perform. I'll use pig instead. In dstcp I checked and none of the directories are duplicate. Looking at the logs it looks like it's failing because all those directories have