As I understand, the JobConf getters/setters are best for data static
to the entire job.
What's the recommended way to pass a variable to the mappers/reducers
that might be different for each InputSplit?
For example, let's say I'm using Hadoop's grep example to extract
information from a collecti
This sounds like a bug.
The memory requirements for hadoop itself shouldn't change with the split
size. At the very least, it should adapt correctly to whatever the memory
limits are.
Can you build a version of your program that works from random data so that
you can file a bug? If you contac
My mapper in this case is the identity mapper, and the reducer gets
about 10 values per key and makes a collect decision based on the data
in the values.
The reducer is very close to a no-op, and uses very little additional
memory than the values.
I believe the problem is in the amount of buff
Hi,
I got the similar problem too. Then I have to keep the split size smaller to
solve it.
-Rui
- Original Message
From: Ted Dunning <[EMAIL PROTECTED]>
To: hadoop-user@lucene.apache.org
Sent: Tuesday, December 25, 2007 1:56:16 PM
Subject: Re: question on Hadoop configuration for non
What are your mappers doing that they run out of memory? Or is it your
reducers?
Often, you can write this sort of program so that you don't have higher
memory requirements for larger splits.
On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> We have tried reducing the number of
Ahhh My previous comments assumed that "long-lived" meant jobs that run
for days and days and days (essentially forever).
15 minute jobs with a finite work-list is actually a pretty good match for
map-reduce as implemented by Hadoop.
On 12/25/07 10:04 AM, "Kirk True" <[EMAIL PROTECTED]> wro
We have two flavors of jobs we run through hadoop, the first flavor is a
simple merge sort, where there is very little happening in the mapper or
the reducer.
The second flavor are very compute intensive.
In the first type, our each map task consumes its (default sized) 64meg
input split in a
Hmm, our long-running Hadoop tasks are CPU-intensive and are also the only
workload we really care about, so I wasn't really think about that use case.
How about running multiple Hadoop cluster instances overlapped on the same set
of boxes? You could then schedule your long-running low-CPU task
Hi all,
Thanks for all the replies thus far...
Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote: in many cases - long running tasks
are of low cpu util. i have trouble imagining how these can mix well with cpu
intensive short/batch tasks. afaik - hadoop's job scheduling is not resource
usage aware.
You can get and set variables in the JobConf. The map task's
configure() method takes a JobConf as a parameter, and you can keep the
reference as an instance variable.
Ted
helena21 wrote:
Hi everybody,
please explain me the steps to pass user parameters for the mapper class.
thanks.
We set the values in the JobConf object, in the driver, using the set
methods defined on the JobConf class.
In the mapper class we access the conf parameters, via the get methods
defined on the JobConf class.
The JobConf, configuration object is made available to the mapper class
in the method
Hi everybody,
please explain me the steps to pass user parameters for the mapper class.
thanks.
--
View this message in context:
http://www.nabble.com/how-to-pass-user-parameter-for-the-mapper-tp14496141p14496141.html
Sent from the Hadoop Users mailing list archive at Nabble.com.
since it is complaining about logging, check if the path log4j is trying
to access is valid and the hadoop user has permissions to access it.
>
> i can't solve it now
>
> jibjoice wrote:
>>
>> i follow this link "http://wiki.apache.org/nutch/NutchHadoopTutorial"; so
>> i
>> think it's not about th
in many cases - long running tasks are of low cpu util. i have trouble
imagining how these can mix well with cpu intensive short/batch tasks. afaik -
hadoop's job scheduling is not resource usage aware. long background tasks
would consume per-machine task slots that would block out other tasks f
14 matches
Mail list logo