Re: Mapreduce Job' user

2012-02-16 Thread Harsh J
Joey has it right if you are indeed using a security-enabled release, and the configuration for the same is documented at http://hadoop.apache.org/common/docs/r1.0.0/Secure_Impersonation.html On Fri, Feb 17, 2012 at 1:13 AM, Joey Echeverria wrote: > Are you using one of the security enabled relea

Re: Mapreduce Job' user

2012-02-16 Thread Vamshi Krishna
Hi Jose, According to my knowledge, if you want to use any options with -D , at the commandline when running jobs, you cannot use any options other than the ones listed in the link http://hadoop.apache.org/common/docs/current/mapred-default.html . Therefore you cannot use "user.name" . The first

reduce output compression of Terasort

2012-02-16 Thread Juwei Shi
Hi, I am benchmarking the cluster using the Terasort package of Hadoop 0.20.2. I enabled compression for both map output (*mapred.compress.map.output*) and reduce output (*mapred.output.compress*). I checked the parameter in Job.xml, both are true. I can see that the compression for Map output wor

Re: num of reducer

2012-02-16 Thread Thamizhannal Paramasivam
Thank you so much to Joey & Bejoy for your suggestions. The Job's input path has 1300-1400 text files and each of 100-200MB. I thought, TextInputFormat spans single mapper per file and MultiFileInputFormat spans less number mapper(<(1300-1400)) that processes more many input files. Which input f

Re: Optimized Hadoop

2012-02-16 Thread Schubert Zhang
1) it should be sort-avoidance. 2) work pool (like Tenzing) Sorry ,the adaptive heartbeat code is not in this github code, we are discussing it. On Fri, Feb 17, 2012 at 11:00 AM, Anty wrote: > Hi: Todd > > yes, the rewritten shuffle in actual a backport of the shuffle from MR2 . > We mainly ad

Re: Partitioners - How to know if they are working

2012-02-16 Thread David Rosenstrauch
On 02/16/2012 12:49 PM, ext-fabio.alme...@nokia.com wrote: Hello All, I wrote my own partitioner and I would like to see if it's working. By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys tha

Re: Optimized Hadoop

2012-02-16 Thread Anty
Hi: Todd yes, the rewritten shuffle in actual a backport of the shuffle from MR2 . We mainly add the following two features: 1) shuffle avoidance 2) work pool On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon wrote: > Hey Schubert, > > Looking at the code on github, it looks like your rewritten shuf

Re: Mapreduce Job' user

2012-02-16 Thread Joey Echeverria
Are you using one of the security enabled releases of Hadoop (0.20.20x,1.0.x,0.23.x,CDH3)? Assuming you are, you need to do something like the following to impersonate a user: You'll need to modify your code to use something like this: UserGroupInformation.createRemoteUser("cuser").doAs(new Privi

Re: Optimized Hadoop

2012-02-16 Thread Todd Lipcon
Hey Schubert, Looking at the code on github, it looks like your rewritten shuffle is in fact just a backport of the shuffle from MR2. I didn't look closely - are there any distinguishing factors? Also, the OOB heartbeat and adaptive heartbeat code seems to be the same as what's in 1.0? -Todd On

Re: Partitioners - How to know if they are working

2012-02-16 Thread Harsh J
Hi Fabio, There are test cases in the MapReduce project releases that test setting a custom partitioner and ensuring it works as intended. But if you still wish to assert/assure self, you should be able to add a LOG statement to your custom Partitioner class's initialization methods, that may ind

Mapreduce Job' user

2012-02-16 Thread Jose Luis Soler
Hi All, Is there some way to force the owner (user name) of a Job sent to a Hadoop cluster? I'm trying to use the following code when configuring the job: JobConf job = new JobConf(); job.setUser("desiredUserName"); but it seems to have no effect as the job owner is sent as the user I'm lo

Partitioners - How to know if they are working

2012-02-16 Thread ext-fabio.almeida
Hello All, I wrote my own partitioner and I would like to see if it's working. By printing the return of method getPartition I could see that the partitions were different, but were they really working? To answer that I got the keys that every reducer task processed and that was what I expected.

Re: Optimized Hadoop

2012-02-16 Thread Schubert Zhang
Here is the presentation to describe our job, http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216aWellcome to give your advises. It's just a little step, and we are continue to do more improvements, thanks for your help. On Thu, Feb 16, 2012 at 11:01 PM, Anty wr

Fwd: Error after successful job completion

2012-02-16 Thread Gabriel Rosendorf
Hi all, Just finished running a job using Hadoop 0.20.203.0 and Pig 0.9.1, pulling data out of a single Cassandra 1.0.7 column family. It completed successfully , but I'm seeing this exception on a lot of the completed tasks in the task list: java.lang.RuntimeException: Error while running comm

Re: num of reducer

2012-02-16 Thread Joey Echeverria
Is your data size 100-200MB *total*? If so, then this is the expected behavior for MultiFileInputFormat. As Bejoy says, you can switch to TextInputFormat to get one mapper per block (min one mapper per file). -Joey On Thu, Feb 16, 2012 at 11:03 AM, Thamizhannal Paramasivam < thamizhanna...@gmail

Re: num of reducer

2012-02-16 Thread bejoy . hadoop
Hi Tamizh If your input comprises of text files then changing the input format to TextInputFormat can get things right. One mapper for each hdfs block. Regards Bejoy K S From handheld, Please excuse typos. -Original Message- From: Thamizhannal Paramasivam Date: Thu, 16 Feb 20

Re: num of reducer

2012-02-16 Thread Thamizhannal Paramasivam
Here are the input format for mapper. Input Format: MultiFileInputFormat MapperOutputKey : Text MapperOutputValue: CustomWritable I shall not be in the position to upgrade hadoop-0.19.2 for some reason. I have checked in number of mapper on job-tracker. Thanks, Thamizh On Thu, Feb 16, 2012 at 6

Re: Changing default task JVM classpath

2012-02-16 Thread John Armstrong
On 02/16/2012 10:15 AM, Harsh J wrote: That is how HBase does it: HBaseConfiguration at driver loads up HBase *xml file configs from driver classpath (or user set() entries, either way), and then submits that as part of job.xml. These configs should be all you need. It should be, and yet I'm ru

Re: Changing default task JVM classpath

2012-02-16 Thread Harsh J
You should load the config elements into the job configuration XML (Job.getConfiguration() or JobConf) during submission - loading from each machine will introduce problems you don't need and can rather avoid. That is how HBase does it: HBaseConfiguration at driver loads up HBase *xml file configs

Optimized Hadoop

2012-02-16 Thread Anty
Hi: Guys We just deliver a optimized hadoop , if you are interested, Pls refer to https://github.com/hanborq/hadoop -- Best Regards Anty Rao

Changing default task JVM classpath

2012-02-16 Thread John Armstrong
Hi, everybody. I'm having some difficulties, which I've traced to not having the Accumulo libraries and configuration available in my task JVMs. The most elegant solution -- especially since I will not always have control over the Accumulo configuration files -- would be to make them available t

Re: num of reducer

2012-02-16 Thread Joey Echeverria
Hi Tamil, I'd recommend upgrading to a newer release as 0.19.2 is very old. As for your question, most input formats should set the number mappers correctly. What input format are you using? Where did you see the number of tasks it assigned to the job? -Joey On Thu, Feb 16, 2012 at 1:40 AM, Tham