ReduceTask ShuffleRamManager : Java Heap memory error

2012-12-04 Thread Olivier Varene - echo
Hi to all, first many thanks for the quality of the work you are doing : thanks a lot I am facing a bug with the memory management at shuffle time, I regularly get Map output copy failure : java.lang.OutOfMemoryError: Java heap space at

RE: [Bulk] Re: Failed To Start SecondaryNameNode in Secure Mode

2012-12-04 Thread David Parks
I'm curious about profiling, I see some documentation about it (1.0.3 on AWS), but the references to JobConf seem to be for the old api and I've got everything running on the new api. I've got a job to handle processing of about 30GB of compressed CSVs and it's taking over a day with 3

Re: understanding performance

2012-12-04 Thread Mahesh Balija
Hi Peter, Can you also track the the details like, in what nodes your mappers/reducers are running in each execution. As your data might have been replicated across different nodes each time your job runs JobTracker might schedule your task to run in different nodes in the

Re: Changing hadoop configuration without restarting service

2012-12-04 Thread Hemanth Yamijala
Generally true for the framework config files, but some of the supplementary features can be refreshed without restart. For e.g. scheduler configuration, host files (for included / excluded nodes) ... On Tue, Dec 4, 2012 at 5:33 AM, Cristian Cira cmc0...@tigermail.auburn.eduwrote: No. You will

Re: Socket timeout for BlockReaderLocal

2012-12-04 Thread panfei
I noticed that you are using jdk 1.7 , personally I prefer 1.6.x ; if your firewall is OK, you can check you RPC service to see if it is also OK; and test it by telnet 10.130.110.80 50020; I suggested hive because HQL(SQL-like) is familiar to most people, and the learning curve is smooth;

Task Tracker Not Starting in Hadoop-1.0.3 Help Please

2012-12-04 Thread Dibyendu Karmakar
Hiii, I am new in hadoop. Trying to configure it in fully distributed mode. But after the command bin/start-all.sh or bin/start-mapred.sh or bin/hadoop-daemon.sh start tasktracker, TASKTRACKER IS GOING DOWN IMMEDIATELY WITHOUT ANY ERROR IN LOG FILE. HELP PLEASE... I'm using hadoop 1.0.3 version.

Re: Task Tracker Not Starting in Hadoop-1.0.3 Help Please

2012-12-04 Thread Dibyendu Karmakar
THE ENTIRE CONTENT OF TASKTRACKER -- PASTED HERE WITH AN ATTACHMENT ALSO --- THANK YOU FOR THE REPLY 2012-12-04 15:20:45,942 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG: / STARTUP_MSG: Starting TaskTracker STARTUP_MSG: host

Map Reduce jobs taking a long time at the end

2012-12-04 Thread Jay Whittaker
Hey, We are running Map reduce jobs against a 12 machine hbase cluster and for a long time they took approx 30 mins to return a result against ~95 million rows. Without any major changes to the data or any upgrade of hbase/hadoop they now seem to be taking about 4 hours. and the logs are full of

Re: Socket timeout for BlockReaderLocal

2012-12-04 Thread Robert Molina
Hi Haitao, To help isolate, what happens if you run a different job? Also, if you view the namenode webui or the specific datanode webui having the issue, are there any indicators of it being down? Regards, Robert On Tue, Dec 4, 2012 at 12:49 AM, panfei cnwe...@gmail.com wrote: I noticed that

RE: Unable to process jobs - Status stuck on Initialize

2012-12-04 Thread Cristian Cira
What does JPS return on your namenode and on one of the datanodes? Cristian Cira Graduate Research Assistant Parallel Architecture and System Laboratory(PASL) Shelby Center 2105 Auburn University, AL 36849 From: Robert Molina [rmol...@hortonworks.com]

RE: Question on Key Grouping

2012-12-04 Thread David Parks
First rule to be wary of is your use of the combiner. The combiner *might* be run, it *might not* be run, and it *might be run multiple times*. The combiner is only for reducing the amount of data going to the reducer, and it will only be run *if and when* it's deemed likely to be useful by