Joey has it right if you are indeed using a security-enabled release,
and the configuration for the same is documented at
http://hadoop.apache.org/common/docs/r1.0.0/Secure_Impersonation.html
On Fri, Feb 17, 2012 at 1:13 AM, Joey Echeverria wrote:
> Are you using one of the security enabled relea
Hi Jose,
According to my knowledge, if you want to use any options with -D , at the
commandline when running jobs, you cannot use any options other than the
ones listed in the link
http://hadoop.apache.org/common/docs/current/mapred-default.html .
Therefore you cannot use "user.name" . The first
Hi,
I am benchmarking the cluster using the Terasort package of Hadoop 0.20.2.
I enabled compression for both map output (*mapred.compress.map.output*)
and reduce output (*mapred.output.compress*). I checked the parameter in
Job.xml, both are true. I can see that the compression for Map output
wor
Thank you so much to Joey & Bejoy for your suggestions.
The Job's input path has 1300-1400 text files and each of 100-200MB.
I thought, TextInputFormat spans single mapper per file and
MultiFileInputFormat spans less number mapper(<(1300-1400)) that processes
more many input files.
Which input f
1) it should be sort-avoidance.
2) work pool (like Tenzing)
Sorry ,the adaptive heartbeat code is not in this github code, we are
discussing it.
On Fri, Feb 17, 2012 at 11:00 AM, Anty wrote:
> Hi: Todd
>
> yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
> We mainly ad
On 02/16/2012 12:49 PM, ext-fabio.alme...@nokia.com wrote:
Hello All,
I wrote my own partitioner and I would like to see if it's working.
By printing the return of method getPartition I could see that the partitions
were different, but were they really working? To answer that I got the keys
tha
Hi: Todd
yes, the rewritten shuffle in actual a backport of the shuffle from MR2 .
We mainly add the following two features:
1) shuffle avoidance
2) work pool
On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon wrote:
> Hey Schubert,
>
> Looking at the code on github, it looks like your rewritten shuf
Are you using one of the security enabled releases of Hadoop
(0.20.20x,1.0.x,0.23.x,CDH3)? Assuming you are, you need to do something
like the following to impersonate a user:
You'll need to modify your code to use something like this:
UserGroupInformation.createRemoteUser("cuser").doAs(new
Privi
Hey Schubert,
Looking at the code on github, it looks like your rewritten shuffle is
in fact just a backport of the shuffle from MR2. I didn't look closely
- are there any distinguishing factors?
Also, the OOB heartbeat and adaptive heartbeat code seems to be the
same as what's in 1.0?
-Todd
On
Hi Fabio,
There are test cases in the MapReduce project releases that test
setting a custom partitioner and ensuring it works as intended.
But if you still wish to assert/assure self, you should be able to add
a LOG statement to your custom Partitioner class's initialization
methods, that may ind
Hi All,
Is there some way to force the owner (user name) of a Job sent to a
Hadoop cluster?
I'm trying to use the following code when configuring the job:
JobConf job = new JobConf();
job.setUser("desiredUserName");
but it seems to have no effect as the job owner is sent as the user I'm
lo
Hello All,
I wrote my own partitioner and I would like to see if it's working.
By printing the return of method getPartition I could see that the partitions
were different, but were they really working? To answer that I got the keys
that every reducer task processed and that was what I expected.
Here is the presentation to describe our job,
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216aWellcome
to give your advises.
It's just a little step, and we are continue to do more improvements,
thanks for your help.
On Thu, Feb 16, 2012 at 11:01 PM, Anty wr
Hi all,
Just finished running a job using Hadoop 0.20.203.0 and Pig 0.9.1, pulling data
out of a single Cassandra 1.0.7 column family. It completed successfully , but
I'm seeing this exception on a lot of the completed tasks in the task list:
java.lang.RuntimeException: Error while running comm
Is your data size 100-200MB *total*?
If so, then this is the expected behavior for MultiFileInputFormat. As
Bejoy says, you can switch to TextInputFormat to get one mapper per block
(min one mapper per file).
-Joey
On Thu, Feb 16, 2012 at 11:03 AM, Thamizhannal Paramasivam <
thamizhanna...@gmail
Hi Tamizh
If your input comprises of text files then changing the input format
to TextInputFormat can get things right. One mapper for each hdfs block.
Regards
Bejoy K S
From handheld, Please excuse typos.
-Original Message-
From: Thamizhannal Paramasivam
Date: Thu, 16 Feb 20
Here are the input format for mapper.
Input Format: MultiFileInputFormat
MapperOutputKey : Text
MapperOutputValue: CustomWritable
I shall not be in the position to upgrade hadoop-0.19.2 for some reason.
I have checked in number of mapper on job-tracker.
Thanks,
Thamizh
On Thu, Feb 16, 2012 at 6
On 02/16/2012 10:15 AM, Harsh J wrote:
That is how HBase does it: HBaseConfiguration at driver loads up HBase
*xml file configs from driver classpath (or user set() entries, either
way), and then submits that as part of job.xml. These configs should
be all you need.
It should be, and yet I'm ru
You should load the config elements into the job configuration XML
(Job.getConfiguration() or JobConf) during submission - loading from
each machine will introduce problems you don't need and can rather
avoid.
That is how HBase does it: HBaseConfiguration at driver loads up HBase
*xml file configs
Hi: Guys
We just deliver a optimized hadoop , if you are interested, Pls
refer to https://github.com/hanborq/hadoop
--
Best Regards
Anty Rao
Hi, everybody.
I'm having some difficulties, which I've traced to not having the
Accumulo libraries and configuration available in my task JVMs. The
most elegant solution -- especially since I will not always have control
over the Accumulo configuration files -- would be to make them available
t
Hi Tamil,
I'd recommend upgrading to a newer release as 0.19.2 is very old. As for
your question, most input formats should set the number mappers correctly.
What input format are you using? Where did you see the number of tasks it
assigned to the job?
-Joey
On Thu, Feb 16, 2012 at 1:40 AM, Tham
22 matches
Mail list logo