Re: Hadoop-Pig setup question

2012-03-09 Thread Atul Thapliyal
Hello Hadoop users, Issue described here is resolved now. Thanks On Thu, Mar 8, 2012 at 10:51 PM, Atul Thapliyal aatul.thapli...@gmail.comwrote: Hi Hadoop users, I am new member and please let me know if this is not the correct format to ask questions. I am trying to setup a small Hadoop

Re: Hadoop-Pig setup question

2012-03-09 Thread Subir S
what was the solution? On Fri, Mar 9, 2012 at 2:03 PM, Atul Thapliyal aatul.thapli...@gmail.comwrote: Hello Hadoop users, Issue described here is resolved now. Thanks On Thu, Mar 8, 2012 at 10:51 PM, Atul Thapliyal aatul.thapli...@gmail.comwrote: Hi Hadoop users, I am new member

Re: state of HOD

2012-03-09 Thread Edward Capriolo
It has been in a quasi-defunct state for a while now. It seems like hadoop.next and yarn, helps archive a similar effect of hod. Plus it has this new hotness factor. On Fri, Mar 9, 2012 at 2:41 AM, Stijn De Weirdt stijn.dewei...@ugent.be wrote: (my apologies for those who have received this

Re: Hadoop-Pig setup question

2012-03-09 Thread Atul Thapliyal
Hi Subir, For me it was a ssh configuration issue. Steps taken 1. Executed the start-mapred.sh 2. Got the ssh_exchange_identification closed by remote host 3. Tried connecting to the same machine using ssh (ssh localhost). Secondary node was not able to do a ssh localhost. 4. Configured this and

Hive 0.7.1 or 0.8.1 with CDH3?

2012-03-09 Thread Keith Wiley
CDH3 seems to formally provide Hive 0.7.1, but Hive is up to 0.8.1. I currently use a CDH3 Hadoop installation. I was curious if I should strictly sick with CDH3 for related produces and go with Hive 0.7.1 or if I should try to use 0.8.1 instead. What major advantages does the more recent

Re: does hadoop always respect setNumReduceTasks?

2012-03-09 Thread Bejoy Ks
Hi Jayne Adding on to Lance's comments, (answer to other queries) i am wondering if hadoop always respect Job.setNumReduceTasks(int)? Yes, unless you mark it final in mapred-site.xml (you normally never do) i noticed that most, if not all, my jobs have only 1 reducer,

Re: Hive 0.7.1 or 0.8.1 with CDH3?

2012-03-09 Thread Keith Wiley
Not the part where I inquire about the major differences between 7 and 8. At any rate, I'll ask elsewhere. There are hive-specific lists I suppose. On Mar 9, 2012, at 11:06 , Arun C Murthy wrote: This is a question for CDH lists. On Mar 9, 2012, at 10:22 AM, Keith Wiley wrote: CDH3

mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this:

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
What's the difference between setNumMapTasks and mapred.map.tasks? On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote: Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means

mapred.tasktracker.map.tasks.maximum not working

2012-03-09 Thread Mohit Anchlia
I have mapred.tasktracker.map.tasks.maximum set to 2 in my job and I have 5 nodes. I was expecting this to have only 10 concurrent jobs. But I have 30 mappers running. Does hadoop ignores this setting when supplied from the job?

Re: mapred.tasktracker.map.tasks.maximum not working

2012-03-09 Thread Chen He
you set the mapred.tasktracker.map.tasks.maximum in your job means nothing. Because Hadoop mapreduce platform only checks this parameter when it starts. This is a system configuration. You need to set it in your conf/mapred-site.xml file and restart your hadoop mapreduce. On Fri, Mar 9,

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
if you do not specify setNumMapTasks, by default, system will use the number you configured for mapred.map.tasks in the conf/mapred-site.xml file. On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's the difference between setNumMapTasks and mapred.map.tasks? On

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
Is this system parameter too? Or can I specify as mapred.map.tasks? I am using pig. On Fri, Mar 9, 2012 at 6:19 PM, Chen He airb...@gmail.com wrote: if you do not specify setNumMapTasks, by default, system will use the number you configured for mapred.map.tasks in the conf/mapred-site.xml

setting up a large hadoop cluster

2012-03-09 Thread Masoud
Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread bejoy . hadoop
Mohit It is a job level config parameter. For plain map reduce jobs you can set the same through CLI as hadoop jar ... -D mapred.map.tasks=n You should be able to do it pig as well. However the number of map tasks for a job are governed by the input splits and the Input Format you are

Re: mapred.tasktracker.map.tasks.maximum not working

2012-03-09 Thread bejoy . hadoop
Adding on to Chen's response. This is a setting meant at Task Tracker level(environment setting based on parameters like your CPU cores, memory etc) and you need to override the same at each task tracker's mapred-site.xml and restart the TT daemon for changes to be in effect. Regards Bejoy K

Re: setting up a large hadoop cluster

2012-03-09 Thread Patai Sangbutsarakum
We did 2pb clusters by puppet. What did you find unclear? P On Mar 9, 2012, at 21:32, Masoud mas...@agape.hanyang.ac.kr wrote: Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting