RE: difference between mapper and map runnable

2009-08-28 Thread Amogh Vasekar
Hi, Mapper is used to process the K,V pair passed to it, MapRunnable is an interface, when implemented is responsible for generating a conforming K,V pair and pass it to Mapper. Cheers! Amogh -Original Message- From: Rakhi Khatwani [mailto:rkhatw...@gmail.com] Sent: Thursday, August

Re: How running hadoop without command line?

2009-08-28 Thread radar.sxl
Thanks Vladimir Klimontovich But there is another problem conf.addResource(new Path(./hadoop-default.xml)); conf.addResource(new Path(./hadoop-site.xml)); //conf.setJar(WordCount.jar); 09/08/28 16:13:13 WARN fs.FileSystem: 192.168.1.130:9000 is a deprecated filesystem name. Use

Re: Cloudera Video - Hadoop build on eclipse

2009-08-28 Thread bharath vissapragada
That maven repo link seems to be fine because i could access it successfully from the browser . Please Help Thanks in advance On Fri, Aug 28, 2009 at 4:46 PM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi all, Iam trying to build hadoop on eclipse with the help of

Re: understand merge phase performance

2009-08-28 Thread Jothi Padmanabhan
Could you attach the complete reducer logs for the two runs? I am guessing that there could have been more map outputs shuffled to disk in Job1. How many maps did the Jobs have? And what were there map output sizes? Cheers, Jothi On 8/28/09 1:21 AM, Rares Vernica rvern...@gmail.com wrote:

Re: Cloudera Video - Hadoop build on eclipse

2009-08-28 Thread ashish pareek
Hello Bharath, Earlier even I faced the same problem. I think your are accessing internet through proxy.So try using direct broadband connection. Hope this will solve your problem. Ashish Pareek On Fri, Aug 28, 2009 at 4:46 PM, bharath vissapragada

Re: Cloudera Video - Hadoop build on eclipse

2009-08-28 Thread bharath vissapragada
I saw ur mail to this list , but no-one replied to it . Yes iam behind my institute proxy .. Thanks for your reply , I'll try using it frm my home-broadband connection . :) On Fri, Aug 28, 2009 at 7:11 PM, ashish pareek pareek...@gmail.com wrote: Hello Bharath, Earlier even

Re: understand merge phase performance

2009-08-28 Thread Rares Vernica
On Fri, Aug 28, 2009 at 4:46 AM, Jothi Padmanabhanjoth...@yahoo-inc.com wrote: Could you attach the complete reducer logs for the two runs? Attached. How many maps did the Jobs have? Both jobs had 4 maps. And what were there map output sizes? The sizes in bytes are in the first email.

Re: understand merge phase performance

2009-08-28 Thread Rares Vernica
On Fri, Aug 28, 2009 at 10:13 AM, Rares Vernicarvern...@gmail.com wrote: On Fri, Aug 28, 2009 at 4:46 AM, Jothi Padmanabhanjoth...@yahoo-inc.com wrote: Could you attach the complete reducer logs for the two runs? Attached. Forgot the attachments... Here they are. Cheers! Rares Vernica

[Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread Steve Gao
would someone give us a hint? Thanks. Why java.util.zip.ZipOutputStream need to use /tmp? The hadoop version is 0.18.3 . Recently we got out of space issue. It's from java.util.zip.ZipOutputStream. We found that /tmp is full and after cleaning /tmp the problem is solved. However why hadoop

performance counters vaidya diagnostics help

2009-08-28 Thread Vasilis Liaskovitis
Hi, a) Is there a wiki page or other documentation explaining the exact meaning of the job / filesystem / mapreduce counters reported after every job run? 9/08/27 15:04:10 INFO mapred.JobClient: Job complete: job_200908271428_0002 09/08/27 15:04:10 INFO mapred.JobClient: Counters: 19 09/08/27

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread Steve Gao
Thanks lot, Brian. It seems to be a design flaw of hadoop that it can not manage (or pass in) the temp of java.util.zip. Can we create a jira ticket for this? --- On Fri, 8/28/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: [Help] Why

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread Brian Bockelman
Actually, poking the code, it seems that the streaming package does set this value: String tmp = jobConf_.get(stream.tmpdir); //, /tmp/$ {user.name}/ Try setting stream.tmpdir to a different directory maybe? Brian On Aug 28, 2009, at 1:31 PM, Steve Gao wrote: Thanks lot, Brian. It

Re: difference between mapper and map runnable

2009-08-28 Thread Arun C Murthy
On Aug 27, 2009, at 5:25 AM, Rakhi Khatwani wrote: Hi, Whats the difference between a mapper and map runnable and its usage? MapRunnable has more control. It has the iterator to the input keys/ values... The new Map-Reduce api (context objects, available in hadoop-0.20 onwards)

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread Steve Gao
Thanks, Brian. Would you tell me what is the filename of the code snippet? --- On Fri, 8/28/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp? To: common-user@hadoop.apache.org Date:

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread Brian Bockelman
I saw this in: org.apache.hadoop.streaming.StreamJob.packageJobJar Brian On Aug 28, 2009, at 2:04 PM, Steve Gao wrote: Thanks, Brian. Would you tell me what is the filename of the code snippet? --- On Fri, 8/28/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread James Cipar
I would agree with removing it from the default build for now. I only used thrift because that's what we were using for all of the RPC at the time. I'd rather that we just settle on one RPC to rule them all, and I will change the code accordingly. On Aug 28, 2009, at 3:04 PM, Steve Gao

Re: [Help] Why java.util.zip.ZipOutputStream need to use /tmp?

2009-08-28 Thread James Cipar
Sorry that last one, I replied to the wrong message. On Aug 28, 2009, at 3:04 PM, Steve Gao wrote: Thanks, Brian. Would you tell me what is the filename of the code snippet? --- On Fri, 8/28/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu

Re: Who are the major contributors to Hive and/or Hbase?

2009-08-28 Thread Gaurav Sharma
Hope this helps: http://hadoop.apache.org/hive/credits.html http://hadoop.apache.org/hbase/credits.html On Fri, Aug 28, 2009 at 1:26 PM, Gopal Gandhi gopal.gandhi2...@yahoo.comwrote: May be I should change the title? --- On Fri, 8/28/09, Gopal Gandhi gopal.gandhi2...@yahoo.com wrote:

Re: Testing Hadoop job

2009-08-28 Thread Aaron Kimball
Hi Nikhil, MRUnit now supports the 0.20 API as of https://issues.apache.org/jira/browse/MAPREDUCE-800. There are no plans to involve partitioners in MRUnit; it is for mappers and reducers only, and not for full jobs involving input/output formats, partitioners, etc. Use the LocalJobRunner for

Re: Where does System.out.println() go?

2009-08-28 Thread indoos
Hi, sysout for Map Reduce should be visible in 50030 task tracker UI against the individual Map Reduce tasks for executed JOB. This UI anyways uses the individual logs created against each attempt in logs/userlogs/attempt folders. Regards,Sanjay Mark Kerzner-2 wrote: Hi, when I run

add custom timestamps to the Job log

2009-08-28 Thread Rares Vernica
Hello, The job log has some very important timestamps that show the start/end time of various stages of a task attempt. I was wondering it it is possible to add custom timestamps to it. For example, for a reduce task attempt, the job job will contain something like: ReduceAttempt

Re: cost model for MR programs

2009-08-28 Thread indoos
Hi, My suggestion would be that we should not be compelling ourselves to compare databases with Hadoop. However, here is something not probably even close to what you may require, but might be helpful- 1. Number of nodes - these are the parameters to look for - - average time taken by a single