Re: how to pass user parameter for the mapper

2007-12-25 Thread Norbert Burger
As I understand, the JobConf getters/setters are best for data static to the entire job. What's the recommended way to pass a variable to the mappers/reducers that might be different for each InputSplit? For example, let's say I'm using Hadoop's grep example to extract information from a collecti

Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-25 Thread Ted Dunning
This sounds like a bug. The memory requirements for hadoop itself shouldn't change with the split size. At the very least, it should adapt correctly to whatever the memory limits are. Can you build a version of your program that works from random data so that you can file a bug? If you contac

Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-25 Thread Jason Venner
My mapper in this case is the identity mapper, and the reducer gets about 10 values per key and makes a collect decision based on the data in the values. The reducer is very close to a no-op, and uses very little additional memory than the values. I believe the problem is in the amount of buff

Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-25 Thread Rui Shi
Hi, I got the similar problem too. Then I have to keep the split size smaller to solve it. -Rui - Original Message From: Ted Dunning <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Tuesday, December 25, 2007 1:56:16 PM Subject: Re: question on Hadoop configuration for non

Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-25 Thread Ted Dunning
What are your mappers doing that they run out of memory? Or is it your reducers? Often, you can write this sort of program so that you don't have higher memory requirements for larger splits. On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > We have tried reducing the number of

Re: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-25 Thread Ted Dunning
Ahhh My previous comments assumed that "long-lived" meant jobs that run for days and days and days (essentially forever). 15 minute jobs with a finite work-list is actually a pretty good match for map-reduce as implemented by Hadoop. On 12/25/07 10:04 AM, "Kirk True" <[EMAIL PROTECTED]> wro

question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-25 Thread Jason Venner
We have two flavors of jobs we run through hadoop, the first flavor is a simple merge sort, where there is very little happening in the mapper or the reducer. The second flavor are very compute intensive. In the first type, our each map task consumes its (default sized) 64meg input split in a

Re: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-25 Thread Chad Walters
Hmm, our long-running Hadoop tasks are CPU-intensive and are also the only workload we really care about, so I wasn't really think about that use case. How about running multiple Hadoop cluster instances overlapped on the same set of boxes? You could then schedule your long-running low-CPU task

RE: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-25 Thread Kirk True
Hi all, Thanks for all the replies thus far... Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote: in many cases - long running tasks are of low cpu util. i have trouble imagining how these can mix well with cpu intensive short/batch tasks. afaik - hadoop's job scheduling is not resource usage aware.

Re: how to pass user parameter for the mapper

2007-12-25 Thread Ted Dziuba
You can get and set variables in the JobConf. The map task's configure() method takes a JobConf as a parameter, and you can keep the reference as an instance variable. Ted helena21 wrote: Hi everybody, please explain me the steps to pass user parameters for the mapper class. thanks.

Re: how to pass user parameter for the mapper

2007-12-25 Thread Jason Venner
We set the values in the JobConf object, in the driver, using the set methods defined on the JobConf class. In the mapper class we access the conf parameters, via the get methods defined on the JobConf class. The JobConf, configuration object is made available to the mapper class in the method

how to pass user parameter for the mapper

2007-12-25 Thread helena21
Hi everybody, please explain me the steps to pass user parameters for the mapper class. thanks. -- View this message in context: http://www.nabble.com/how-to-pass-user-parameter-for-the-mapper-tp14496141p14496141.html Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Nutch crawl problem

2007-12-25 Thread pvvpr
since it is complaining about logging, check if the path log4j is trying to access is valid and the hadoop user has permissions to access it. > > i can't solve it now > > jibjoice wrote: >> >> i follow this link "http://wiki.apache.org/nutch/NutchHadoopTutorial"; so >> i >> think it's not about th

RE: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-25 Thread Joydeep Sen Sarma
in many cases - long running tasks are of low cpu util. i have trouble imagining how these can mix well with cpu intensive short/batch tasks. afaik - hadoop's job scheduling is not resource usage aware. long background tasks would consume per-machine task slots that would block out other tasks f