java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread Mohit Kundra
Hi , I am new user to hadoop . I have installed hadoop0.19.1 on single windows machine. Its http://localhost:50030/jobtracker.jsp and http://localhost:50070/dfshealth.jsp pages are working fine but when i am executing bin/hadoop jar hadoop-0.19.1-examples.jar pi 5 100 It is showing below $ bi

Re: java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread JunYong Li
is there errors in the task outpu file? on the jobtracker.jsp click the Jobid link -> tasks link -> Taskid link -> Task logs link 2012/5/11 Mohit Kundra > Hi , > > I am new user to hadoop . I have installed hadoop0.19.1 on single windows > machine. > Its http://localhost:50030/jobtracker.jsp and

Re: java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread Prashant Kommireddi
You might be running out of disk space. Check for that on your cluster nodes. -Prashant On Fri, May 11, 2012 at 12:21 AM, JunYong Li wrote: > is there errors in the task outpu file? > on the jobtracker.jsp click the Jobid link -> tasks link -> Taskid link -> > Task logs link > > 2012/5/11 Mohit

Re: java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread Harsh J
Mohit, Why are you using Hadoop-0.19, a version released many years ago? Please download the latest stable available at http://hadoop.apache.org/common/releases.html#Download instead. On Fri, May 11, 2012 at 12:26 PM, Mohit Kundra wrote: > Hi , > > I am new user to hadoop . I have installed hado

Re: Monitoring Hadoop Cluster

2012-05-11 Thread Lance Norskog
zabbix does monitoring, archiving and graphing, and alerts. It has a JMX bean monitor system. If Hadoop has these, or you can add them easily, you have a great monitor. Also, check out 'Starfish'. It's a little old, but I got it running and it was really cool. On Thu, May 10, 2012 at 11:24 PM, Ma

Re: Monitoring Hadoop Cluster

2012-05-11 Thread Stu Teasdale
I've helped out linking hadoop to munin using jmx querying in the past, there's a writeup at: http://www.cs.huji.ac.il/wikis/MediaWiki/lawa/index.php/Munin_for_Hadoop Stu On Fri, May 11, 2012 at 02:15:16AM -0700, Lance Norskog wrote: > zabbix does monitoring, archiving and graphing, and alerts.

Re: High load on datanode startup

2012-05-11 Thread Darrell Taylor
On Thu, May 10, 2012 at 5:58 PM, Raj Vishwanathan wrote: > Darrell > > Are the new dn,nn and mapred directories on the same physical disk? > Nothing on NFS , correct? > Yes, that's correct > > Could you be having some hardware issue? Any clue in /var/log/messages or > dmesg? > Hardware is goo

Re: High load on datanode startup

2012-05-11 Thread Todd Lipcon
On Fri, May 11, 2012 at 2:29 AM, Darrell Taylor wrote: > > What I saw on the machine was thousands of recursive processes in ps of the > form 'bash /usr/bin/hbase classpath...',  Stopping everything didn't clean > the processes up so had to kill them manually with some grep/xargs foo. >  Once this

Re: High load on datanode startup

2012-05-11 Thread Harsh J
Doesn't look like the $HBASE_HOME/bin/hbase script runs "$HADOOP_HOME/bin/hadoop classpath" directly. Its classpath builder seems to add $HADOOP_HOME items manually via listing/etc.. Perhaps if hbase-env.sh has a HBASE_CLASSPATH that imports `hadoop classpath`, and the hadoop-env.sh has a `hbase cl

Re: freeze a mapreduce job

2012-05-11 Thread Harsh J
I do not know about the per-host slot control (that is most likely not supported, or not yet anyway - and perhaps feels wrong to do), but the rest of the needs can be doable if you use schedulers and queues/pools. If you use FairScheduler (FS), ensure that this job always goes to a special pool an

Re: freeze a mapreduce job

2012-05-11 Thread Michael Segel
Just a quick note... If your task is currently occupying a slot, the only way to release the slot is to kill the specific task. If you are using FS, you can move the task to another queue and/or you can lower the job's priority which will cause new tasks to spawn slower than other jobs so you

Re: freeze a mapreduce job

2012-05-11 Thread Rita
thanks. I think I will investigate capacity scheduler. On Fri, May 11, 2012 at 7:26 AM, Michael Segel wrote: > Just a quick note... > > If your task is currently occupying a slot, the only way to release the > slot is to kill the specific task. > If you are using FS, you can move the task to a

How to maintain record boundaries

2012-05-11 Thread Shreya.Pal
Hi When we store data into HDFS, it gets broken into small pieces and distributed across the cluster based on Block size for the file. While processing the data using MR program I want a particular record as a whole without it being split across nodes, but the data has already been split and st

Re: How to maintain record boundaries

2012-05-11 Thread Harsh J
Shreya, This has been asked several times before, and the way it is handled by TextInputFormats (for one example) is explained at http://wiki.apache.org/hadoop/HadoopMapReduce in the Map section. If you are writing a custom reader, feel free to follow the same steps - you basically need to seek ov

Re: java.io.IOException: Task process exit with nonzero status of 1

2012-05-11 Thread samir das mohapatra
Hi Mohit, 1) Hadoop is more portable with Linux,Ubantu or any non dos file system. but you are running hadoop on window it colud be the problem bcz hadoop will generate some partial out put file for temporary use. 2) Another thing is that your are running hadoop version as 0.19 , I think i

transferring between HDFS which reside in different subnet

2012-05-11 Thread Arindam Choudhury
Hi, I have a question to the hadoop experts: I have two HDFS, in different subnet. HDFS1 : 192.168.*.* HDFS2: 10.10.*.* the namenode of HDFS2 has two NIC. One connected to 192.168.*.* and another to 10.10.*.*. So, is it possible to transfer data from HDFS1 to HDFS2 and vice versa. Regards, A

Re: transferring between HDFS which reside in different subnet

2012-05-11 Thread Shi Yu
If you could cross-access HDFS from both name nodes, then it should be transferable using /distcp /command. Shi * * On 5/11/2012 8:45 AM, Arindam Choudhury wrote: Hi, I have a question to the hadoop experts: I have two HDFS, in different subnet. HDFS1 : 192.168.*.* HDFS2: 10.10.*.* the nam

Re: transferring between HDFS which reside in different subnet

2012-05-11 Thread Arindam Choudhury
I can not cross access HDFS. Though HDFS2 has two NIC the HDFS is running on the other subnet. On Fri, May 11, 2012 at 3:57 PM, Shi Yu wrote: > If you could cross-access HDFS from both name nodes, then it should be > transferable using /distcp /command. > > Shi * > * > > On 5/11/2012 8:45 AM, Ar

Re: freeze a mapreduce job

2012-05-11 Thread Shi Yu
Is there any risk to suppress a job too long in FS?I guess there are some parameters to control the waiting time of a job (such as timeout ,etc.), for example, if a job is kept idle for more than 24 hours is there a configuration deciding kill/keep that job? Shi On 5/11/2012 6:52 AM, Ri

Re: How to maintain record boundaries

2012-05-11 Thread Shi Yu
here are some quick code for you (based on Tom's book). You could overwrite the TextInputFormat isSplitable method to avoid splitting, which is pretty important and useful when processing sequence data. //Old API public class NonSplittableTextInputFormat extends TextInputFormat { @Overrid

Re: transferring between HDFS which reside in different subnet

2012-05-11 Thread Shi Yu
It seems in your case HDFS2 could access HDFS, so you should be able to transfer HDFS data to HDFS2. If you want to cross-transfer, you don't need to do distcp on cluster nodes, if any client node (not necessary to be namenode, datanode, secondary node, etc.) could access to both HDFSs, then r

Re: transferring between HDFS which reside in different subnet

2012-05-11 Thread Rajesh Sai T
Looks like both are private subnets, so you got to route via a public default gateway. Try adding route using route command if your in linux(windows i have no idea). Just a thought i havent tried it though. Thanks, Rajesh Typed from mobile, please bear with typos. On May 11, 2012 10:03 AM, "Arind

Re: transferring between HDFS which reside in different subnet

2012-05-11 Thread Arindam Choudhury
So, hadoop dfs -cp hdfs:// hdfs://... this will work. On Fri, May 11, 2012 at 4:14 PM, Rajesh Sai T wrote: > Looks like both are private subnets, so you got to route via a public > default gateway. Try adding route using route command if your in > linux(windows i have no idea). Just a thou

Re: freeze a mapreduce job

2012-05-11 Thread Michael Segel
I haven't seen any. Haven't really had to test that... On May 11, 2012, at 9:03 AM, Shi Yu wrote: > Is there any risk to suppress a job too long in FS?I guess there are some > parameters to control the waiting time of a job (such as timeout ,etc.), > for example, if a job is kept idle fo

Re: freeze a mapreduce job

2012-05-11 Thread Harsh J
Am not aware of a job-level timeout or idle monitor. On Fri, May 11, 2012 at 7:33 PM, Shi Yu wrote: > Is there any risk to suppress a job too long in FS?    I guess there are > some parameters to control the waiting time of a job (such as timeout > ,etc.),   for example, if a job is kept idle for

Re: freeze a mapreduce job

2012-05-11 Thread Robert Evans
There is an idle timeout for map/reduce tasks. If a task makes no progress for 10 min (Default) the AM will kill it on 2.0 and the JT will kill it on 1.0. But I don't know of anything associated with a Job, other then in 0.23 is the AM does not heart beat back in for too long, I believe that t

Question on MapReduce

2012-05-11 Thread Satheesh Kumar
Hi, I am a newbie on Hadoop and have a quick question on optimal compute vs. storage resources for MapReduce. If I have a multiprocessor node with 4 processors, will Hadoop schedule higher number of Map or Reduce tasks on the system than on a uni-processor system? In other words, does Hadoop dete

Re: DatanodeRegistration, socketTImeOutException

2012-05-11 Thread sulabh choudhury
I have set dfs.datanode.max.xcievers=4096 and have swapping turned off, Regionserver Heap = 24 GB Datanode Heap = 1 GB On Fri, May 11, 2012 at 9:55 AM, sulabh choudhury wrote: > I have spent a lot of time trying to find a solution to this issue, but > have had no luck. I think this is because of

RE: Question on MapReduce

2012-05-11 Thread Leo Leung
Nope, you must tune the config on that specific super node to have more M/R slots (this is for 1.0.x) This does not mean the JobTracker will be eager to stuff that super node with all the M/R jobs at hand. It still goes through the scheduler, Capacity Scheduler is most likely what you have. (

Re: Question on MapReduce

2012-05-11 Thread Satheesh Kumar
Thanks, Leo. What is the config of a typical data node in a Hadoop cluster - cores, storage capacity, and connectivity (SATA?).? How many tasktrackers scheduled per core in general? Is there a best practices guide somewhere? Thanks, Satheesh On Fri, May 11, 2012 at 10:48 AM, Leo Leung wrote: >

RE: Question on MapReduce

2012-05-11 Thread Leo Leung
This maybe dated materials. Cloudera and HDP folks please correct with updates :) http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ http://hortonworks.com/blog/best-practice

Re: How to maintain record boundaries

2012-05-11 Thread Ankur C. Goel
Record reader implementations are typically written to honor record boundaries. This means that while reading a split data they will continue reading if the end of split has reached BUT end of record is yet to be encountered. -@nkur On 5/11/12 5:15 AM, "shreya@cognizant.com" wrote: >Hi > >W

Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )

2012-05-11 Thread Jeremy Davis
I see mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't another tuning parameter I need to look at. I can tune the task tracker so that when I have many jobs running, with many simultaneous maps and reduces I utilize 95% of cpu a

Moving files from JBoss server to HDFS

2012-05-11 Thread financeturd financeturd
Hello, We have a large number of custom-generated files (not just web logs) that we need to move from our JBoss servers to HDFS.  Our first implementation ran a cron job every 5 minutes to move our files from the "output" directory to HDFS. Is this recommended?  We are being told by our IT tea