Re: REST web service on top of Hadoop

2010-07-29 Thread S. Venkatesh
HDFS Proxy in contrib provides HTTP interface over HDFS. Its not very RESTful but we are working on a new version which will have a REST API. AFAIK, Oozie will provide REST API for launching MR jobs. Venkatesh On Wed, Jul 28, 2010 at 7:31 PM, eluharani zineellabidine eluhar...@gmail.com wrote:

Parameters that can be set per job

2010-07-29 Thread Devajyoti Sarkar
Hi, Is there a list of configuration parameters that can be set per job. Specifically, can one set: - mapred.tasktracker.map.tasks.maximum - mapred.tasktracker.reduce.tasks.maximum - mapred.map.multithreadedrunner.threads - mapred.child.java.opts - mapred.task.timeout Also, I am trying to

Re: REST web service on top of Hadoop

2010-07-29 Thread Alejandro Abdelnur
In Oozie are working on MR/Pig jobs submission over HTTP. On Thu, Jul 29, 2010 at 5:09 PM, Steve Loughran ste...@apache.org wrote: S. Venkatesh wrote: HDFS Proxy in contrib provides HTTP interface over HDFS. Its not very RESTful but we are working on a new version which will have a REST API.

Re: what affects number of reducers launched by hadoop?

2010-07-29 Thread Abhinay Mehta
Which configuration key controls the number of maximum tasks per node ? On 28 July 2010 20:40, Joe Stein charmal...@allthingshadoop.com wrote: mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling per node you need to configure *mapred.reduce.tasks* to be more than one

Re: what affects number of reducers launched by hadoop?

2010-07-29 Thread Vitaliy Semochkin
mapred.tasktracker.reduce.tasks.maximum PS I found this documet of default values very useful http://hadoop.apache.org/common/docs/r0.18.3/hadoop-default.html however I failed to find it's new version for 0.20.2 Regards, Vitaliy S On Thu, Jul 29, 2010 at 2:31 PM, Abhinay Mehta

Re: what affects number of reducers launched by hadoop?

2010-07-29 Thread Joe Stein
there is no setting but the max tasks would be how many you set for map reduce tasks per node (so if you set 7 for map and 6 for reduce then you will not have more than 13 tasks running on the node as a result of the 2 settings). http://hadoop.apache.org/common/docs/current/cluster_setup.html

how io.file.buffer.size works?

2010-07-29 Thread elton sky
I think my question is ignored, so just post it again: I am a bit confused of how this attribute is used. My understanding is it's related with file read/write. And I can see, in LineReader.java, it's used as the default buffer size for each line; in BlockReader.newBlockReader(), it's used as

reuse cached files

2010-07-29 Thread Gang Luo
Hi all, if I use distributed cache to send some files to all the nodes in one MR job, can I reuse these cached files locally in my next job, or will hadoop re-sent these files again? Thanks, -Gang

How to build multiple inverted indexes?

2010-07-29 Thread ailinykh
Hello, everybody! I have a bunch of records. Each record has key, and two fields A,B - R(k, A,B) I want to build two inverted indexes, one per each field. As output I expect two files IndexA =(A1- [k1,k2,k3..]),(A2 -[k1,k2,k4...]) ... IndexB =(B1- [k1,k2,k3..]),(B2-[k1,k2,k4...]) ... Hadoop

Re: error:Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzopCodec

2010-07-29 Thread Ted Yu
Yes. On Thu, Jul 29, 2010 at 7:57 AM, Alex Luya alexander.l...@gmail.com wrote: Hi, Run:ps -aef | grep -i tasktracker I got this: - alex 2425 1 0 22:34 ?00:00:05

Re: what affects number of reducers launched by hadoop?

2010-07-29 Thread Raj V
Vitaliy, Here are the default values and parameters for the 0.20.2 http://hadoop.apache.org/common/docs/r0.20.2/core-default.html http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html The default values in the XML

mapred.userlog.retain.hours

2010-07-29 Thread vishalsant
I have chnaged on my namenode and the datanodes mapred-site.xml , to include property namemapred.userlog.retain.hours/name value2/value /property And yet my job xml retains 24 . Am I doing anything wrong -- View this message in context:

Re: Parameters that can be set per job

2010-07-29 Thread Harsh J
On Thu, Jul 29, 2010 at 2:25 PM, Devajyoti Sarkar dsar...@q-kk.com wrote: Hi, Is there a list of configuration parameters that can be set per job. Specifically, can one set: - mapred.tasktracker.map.tasks.maximum - mapred.tasktracker.reduce.tasks.maximum -

[ANNOUNCE] Next HUG meetup: Noida/NCR- India - 31st July 2010 : Reminder

2010-07-29 Thread Sanjay Sharma
Hi All, We are planning to hold the next Hadoop India User Group meet up on 31st July 2010 in Noida, India. The registration and event details are available at - http://hugindia-absolutezeroforum.eventbrite.com/ We currently have the following talks lined up- -

Re: How to build multiple inverted indexes?

2010-07-29 Thread Rahul Jain
Hadoop does not prevent you from writing key value pair multiple times in the same map iteration if that is what is your roadblock. You can call collector.collect() multiple times with same or distinct key value pairs within a single map iteration. -Rahul On Thu, Jul 29, 2010 at 8:10 AM,

Re: mapred.userlog.retain.hours

2010-07-29 Thread Ted Yu
Have you restarted your cluster ? You can actually specify this parameter in JobConf. See the usage: TaskLog.cleanup(job.getInt(mapred.userlog.retain.hours, 24)); ./src/mapred/org/apache/hadoop/mapred/Child.java On Thu, Jul 29, 2010 at 10:30 AM, vishalsant vishal.santo...@gmail.comwrote:

Re: mapred.userlog.retain.hours

2010-07-29 Thread Edward Capriolo
On Thu, Jul 29, 2010 at 1:30 PM, vishalsant vishal.santo...@gmail.com wrote: I have chnaged on my namenode and the datanodes mapred-site.xml , to include property    namemapred.userlog.retain.hours/name    value2/value  /property And yet my job xml retains 24 . Am I doing anything

Re: reuse cached files

2010-07-29 Thread Hemanth Yamijala
Hi, if I use distributed cache to send some files to all the nodes in one MR job, can I reuse these cached files locally in my next job, or will hadoop re-sent these files again? Cache files are reused across Jobs. From trunk onwards, they will be restricted to be reused across jobs of the

Re: Parameters that can be set per job

2010-07-29 Thread Hemanth Yamijala
Hi, Is there a list of configuration parameters that can be set per job. I'm almost certain there's no list that documents per-job settable parameters that well. From 0.21 onwards, I think a convention adopted is to name all job-related or task-related parameters to include 'job' or 'map' or