On 07/20/2012 09:20 AM, Dave Shine wrote:
I believe this is referred to as a “key skew problem”, which I know is
heavily dependent on the actual data being processed. Can anyone point
me to any blog posts, white papers, etc. that might give me some options
on how to deal with this issue?
I don
On 04/04/2012 05:00 PM, Kevin Savage wrote:
However, what we have is one big file of design data that needs to go to all
the maps and many big files of climate data that need to go to one map each.
I've not been able to work out if there is a good way of doing this in Hadoop.
It sounds like "
On 02/16/2012 10:15 AM, Harsh J wrote:
That is how HBase does it: HBaseConfiguration at driver loads up HBase
*xml file configs from driver classpath (or user set() entries, either
way), and then submits that as part of job.xml. These configs should
be all you need.
It should be, and yet I'm ru
Hi, everybody.
I'm having some difficulties, which I've traced to not having the
Accumulo libraries and configuration available in my task JVMs. The
most elegant solution -- especially since I will not always have control
over the Accumulo configuration files -- would be to make them available
t
On Wed, 14 Dec 2011 11:04:37 -0500, David Rosenstrauch
wrote:
> I ran into the same (known) issue. (See:
> https://issues.apache.org/jira/browse/MAPREDUCE-1700)
>
> Doesn't look like there's a solution yet.
Thanks; good to know that I'm actually doing the best I can be writing
everything to be
Hi, there.
I've run into an odd situation, and I'm wondering if there's a way around
it; I'm trying to use Jackson for some JSON serialization in my program,
and I wrote/unit-tested it to work with Jackson 1.9. Then, in integration
testing, I started to see some weird version incompatibilities an
On Fri, 16 Sep 2011 08:26:35 -0500, harry lippy
wrote:
> The keys are file offsets into the input file. My question: how did
the
> 'are presented to the map function as key-value pairs' happen? I've run
> the
> example on the input file using the java Mapper, Reducer, and the code
that
> runs
On Thu, 15 Sep 2011 12:43:57 -0500, Arko Provo Mukherjee
wrote:
> Is there a way to pass some data from the driver class to the Mapper
> class without going through the HDFS?
I generally use the Configuration object embedded in the Job for that. My
Tool implements Configurable so I create by job
On Thu, 18 Aug 2011 13:44:22 -0700, vipul sharma
wrote:
> *I think the error is due to using combiner. Since combiner is output
data
> in Text and Reducer is expecting IntArrayWritable. If I remove combiner
> everything works. What am I doing wrong and how can I get the combiner
to
> work? Any hel
On Wed, 27 Jul 2011 10:58:17 -0400, David Rosenstrauch
wrote:
> There is another, easier approach: if your app inherits from the Tool
> class / runs via ToolRunner, then your app can inherit the -libjars
> command line functionality itself.
This is true; the problem with this approach is that
So I think I've figured out how to fix my problem with putting files on
the distributed classpath by digging through the code Hadoop uses to
process -libjars.
If I say
DistributedCache.addFileToClassPath(hdfsFile,conf);
then hdfsFile is added to the distributed cache, but doesn't show upon the
c
On Tue, 26 Jul 2011 12:35:48 -0700, Shrijeet Paliwal
wrote:
> **
> See if this (very old) reply from Mikhail helps.
> http://search-hadoop.com/m/QFVD1kEmQT
> Here is the patch he is referring to.
>
http://m1.archiveorange.com/m/att/RNVYm/ArchiveOrange_8dEcdJI4bXFkKHBnsll8YzTc8u8a.patch
>
> **repl
I'm back to trying to add libraries to the classpath instead of handing
around a fat JAR. This time I've served up my directory full of JARs on
NFS, which each node in my cluster has mounted at /mnt/hadoop-libs. Now my
question is how to add that (local) directory to the classpath of the
mapper a
On Tue, 19 Jul 2011 17:02:32 -0700, Choonho Son
wrote:
> is it possible job.setOutputKeyClass(MapWritable.class);
As others have said, MapWritable doesn't implement Comparable, so it can't
be used as a key. The ArrayWritable of Texts is one idea, but I'd suggest
instead implementing your OWN Wri
On Wed, 22 Jun 2011 15:16:02 -0700, Steve Lewis
wrote:
> Assume I have two data sources A and B
> Assume I have an input format and can generate key values for both A and
B
> I want an algorithm which will generate the cross product of all values
in
> A
> having the key K and all values in B havin
On Wed, 22 Jun 2011 00:15:56 +0200, Gabor Makrai
wrote:
> Fortunately, DistributedCache solved my problem! I put a jar file to
> HDFS. which contains the necessary classes for the job and I used this:
> *DistributedCache.addFileToClassPath(new Path("/myjar/myjar.jar"),
conf);*
Can I ask which ver
On Tue, 21 Jun 2011 06:37:50 -0700, Alex Kozlov
wrote:
> However, the job's tasks are executed in a separate JVM and some
> of the parameters, like max heap from *mapred.java.child.opts*, are set
> during the job execution. In this case the parameter is coming from the
> client side where the who
One of my colleagues and I have a little confusion between us as to
exactly when mapred-site.xml is read. The pages on hadoop.apache.org don't
seem to specify it very clearly.
One position is that mapred-site.xml is read by the daemon processes at
startup, and so changing a parameter in mapred-si
On Wed, 8 Jun 2011 15:09:41 +0100, Virajith Jalaparti
wrote:
> I was looking at the syslog generated by my job run and it looks like
the
> reducers start before the mappers complete. I figured this was the case
> because even when the Map had <100% completion, the reduce completion %
was
> greater
On Wed, 1 Jun 2011 12:48:51 -0700, Alejandro Abdelnur
wrote:
> Do you have all JARs used by your classes in Needed.jar in the DC
classpath
> as well?
needed.jar contains the class Needed, which my mappers need. If the class
Needed calls for another class AlsoNeeded in another jar, wouldn't I ge
On Tue, 31 May 2011 15:09:28 -0400, John Armstrong
wrote:
> On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur
> wrote:
>> What is exactly that does not work?
In the hopes that more information can help, I've dug into the local
filesystems on each of my four nodes and retr
On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur
wrote:
> What is exactly that does not work?
Oozie launches a wrapper MapReduce job to run a Java job J1. Oozie's
/lib/ directory is provided to the classpath of J1 as expected. This part
works.
The Java job J1 configures and launches a Ma
On Mon, 30 May 2011 09:43:14 -0700, Alejandro Abdelnur
wrote:
> If you still want to start your MR job from your Java action, then your
> Java
> action should do all the setup the MapReduceMain class does before
starting
> the MR job (this will ensure delegation tokens and distributed cache is
> a
On Fri, 27 May 2011 15:47:23 -0700, Alejandro Abdelnur
wrote:
> John,
>
> If you are using Oozie, dropping all the JARs your MR jobs needs in the
> Oozie WF lib/ directory should suffice. Oozie will make sure all those
JARs
> are in the distributed cache.
That doesn't seem to work. I have this
On Fri, 27 May 2011 13:52:04 +0200, Laurent Hatier
wrote:
> I'm a newbie with Hadoop/MapReduce. I've a problem with hadoop. I set
some
> variables in the run function but when Map running, he can't get the
value
> of theses variables...
> If anyone knows the solution :)
By the "run function" do y
On Thu, 26 May 2011 23:17:43 +0530, vishnu krishnan
wrote:
> thanks,
>
>
> if am not using using the map/reduce here, that just i directly sent dat
> data to the db, what will be the problems?
Look, I hate to be That Guy, especially on my first day on the list but
would you mind moving to your
Hi, everybody.
I'm running into some difficulties getting needed libraries to map/reduce
tasks using the distributed cache.
I'm using Hadoop 0.20.2, which from what I can tell is a hard requirement
by the client, so more current versions are not really viable options.
The code I've inherited is
27 matches
Mail list logo