Re: How to setup Hive on a single node ?

2012-02-10 Thread Lac Trung
Thanks for your reply ! I've already installed Hive correctly. First, i installed CDH3, unfortunately i use Ubuntu Oneiric but CDH don't support Oneiric, so, i download and install the CDH3 package for Lucid system. Then, i installed Had

Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Rob Stewart
I'm looking to clarify the relationship between MultithreadedMapper.setNumberOfThreads(i) and mapreduce.tasktracker.map.tasks.maximum . If I set: - MultithreadedMapper.setNumberOfThreads( 4 ) - mapreduce.tasktracker.map.tasks.maximum = 1 Will 4 map tasks be executed in four separate threads withi

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Harsh J
Hi Rob, On Fri, Feb 10, 2012 at 5:55 PM, Rob Stewart wrote: > I'm looking to clarify the relationship between > MultithreadedMapper.setNumberOfThreads(i) and > mapreduce.tasktracker.map.tasks.maximum . The former is an in-user-application value that controls the total number of threads to run fo

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Rob Stewart
hi Harsh, On 10 February 2012 12:42, Harsh J wrote: > 4 JVMs if you have 4 tasks in your Job  (# of map tasks of a job is > dependent on its input). > > Each JVM will then run the MultithreadedMapper code, which will then > run 4 threads to call your map() inside of it cause you've asked that >

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Harsh J
Rob, On Fri, Feb 10, 2012 at 6:32 PM, Rob Stewart wrote: > So.. the MultithreadedMapper class splits *one* map task into N number > of threads? How is this achieved? I wasn't aware that a map task could > be implicitly sub-divided implicitly? I was under the (false?) > impression that the purpose

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Rob Stewart
Harsh, On 10 February 2012 13:33, Harsh J wrote: > What you're missing to see here is that the multithreaded mapper is > something that runs as part of one single map task. > With just one JVM slot, you'd end up processing only one input-chunk > at a time, though with 4 threads doing map() co

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Harsh J
Hello again, On Fri, Feb 10, 2012 at 7:31 PM, Rob Stewart wrote: > OK, take word count. The to the map is beta">. The canonical Hadoop program would tokenize this line of text > and output <"foo",1> and so on. How would the multithreadedmapper know > how to further divide this line of text into

Hadoop 0.21.0 streaming giving no status information

2012-02-10 Thread Patrick Donnelly
Hi, I'm trying to upgrade an application previously written for Hadoop 0.20.0 for 0.21.0. I'm running into an issue with the status output missing which is making it difficult to get the jobid/success status: hadoop/bin/hadoop jar hadoop/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -D map

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-10 Thread Varun Kapoor
Hey Merto, Any luck getting the patch running on your cluster? In case you're interested, there's now a JIRA for this: https://issues.apache.org/jira/browse/HADOOP-8052. Varun On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor wrote: > Your general procedure sounds correct (i.e. dropping your newly

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Rob Stewart
Harsh... Oddly, this blog post has appeared within the last hour or so http://kickstarthadoop.blogspot.com/2012/02/enable-multiple-threads-in-mapper-aka.html -- Rob On 10 February 2012 14:20, Harsh J wrote: > Hello again, > > On Fri, Feb 10, 2012 at 7:31 PM, Rob Stewart wrote: >> OK, take

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Rob Stewart
Thanks, this is a lot clearer. One final question... On 10 February 2012 14:20, Harsh J wrote: > Hello again, > > On Fri, Feb 10, 2012 at 7:31 PM, Rob Stewart wrote: >> OK, take word count. The to the map is > beta">. The canonical Hadoop program would tokenize this line of text >> and output <

Fwd: HELP - Problem in setting up Hadoop - Multi-Node Cluster

2012-02-10 Thread Guruprasad B
Dear Robin, Thanks for your valuable time and response. please find the attached namenode logs and configurations files. I am using 2 ubuntu boxes.One as master & slave and other as slave. below given is the environment set-up in both the machines. : Hadoop : hadoop_0.20.2 Linux: Ubuntu Linux 10

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread bejoy . hadoop
Hi Rob I'm the culprit who posted the blog. :) The topic was of my interest as well and I found the conversation informative and useful. Just thought of documenting the same as it could be useful for others as well in future. Hope you don't mind!.. Regards Bejoy K S From handheld, Ple

Re: HELP - Problem in setting up Hadoop - Multi-Node Cluster

2012-02-10 Thread Guruprasad B
Dear Robin, Yes, it is possible. Regards, Guru On Fri, Feb 10, 2012 at 1:23 PM, Robin Mueller-Bady < robin.mueller-b...@oracle.com> wrote: > Dear Guruprasad, > > is it possible to ping both machines with their hostnames ? (ping master / > ping slave) ? > > Regards, > > Robin > > On 10.02.2012

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread bejoy . hadoop
Hi Rob I'd try to answer this. From my understanding if you are using Multithreaded mapper on word count example with TextInputFormat and imagine you have 2 threads and 2 lines in your input split . RecordReader would read Line 1 and give it to map thread 1 and line 2 to map thread 2. So

Where Is DataJoinMapperBase?

2012-02-10 Thread Bing Li
Hi, all, I am starting to learn advanced Map/Reduce. However, I cannot find the class DataJoinMapperBase in my downloaded Hadoop 1.0.0 and 0.20.2. So I searched on the Web and get the following link. http://www.java2s.com/Code/Jar/h/Downloadhadoop0201datajoinjar.htm >From the link I got the

Re: Combining MultithreadedMapper threadpool size & map.tasks.maximum

2012-02-10 Thread Raj Vishwanathan
Here is what I understand  The RecordReader for the MTMappert takes the input split and cycles the records among the available threads. It also ensures that the map outputs are synchronized.  So what Bejoy says is what will happen for the wordcount program.  Raj >>___

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-10 Thread Merto Mertek
Varun unfortunately I have had some problems with deploying a new version on the cluster.. Hadoop is not picking the new build in lib folder despite a classpath is set to it. The new build is picked just if I put it in the $HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes

Re: HELP - Problem in setting up Hadoop - Multi-Node Cluster

2012-02-10 Thread anil gupta
Hi, Is your datanode initially able to connect to Namenode? Have you disabled all the firewalls related services? Do you see any errors at the startup log of Namenode or Datanode? I have dealt with similar kind of this problem earlier. So here is what you can try to do: First, test that ssh is wo

is 1.0.0 stable?

2012-02-10 Thread Stan Kaushanskiy
Hi everyone, I would imagine that 1.0.0 is "stable", but the stable link still takes one to the 0.20.203 release. Is 1.0.0 ready for production usage? If not what about 0.20.205? thanks, stan

Re: reference document which properties are set in which configuration file

2012-02-10 Thread Harsh J
As a thumb rule, all properties starting with mapred.* or mapreduce.* go to mapred-site.xml, all properties starting with dfs.* go to hdfs-site.xml, and the rest may be put in core-site.xml to be safe. In case you notice MR or HDFS specific properties being outside of this naming convention, pleas

Re: reference document which properties are set in which configuration file

2012-02-10 Thread Raj Vishwanathan
Harsh, All This was one of the first questions that  I asked. It is sometimes not clear whether some parameters are site related  or jab related or whether it belongs to NN, JT , DN or TT. If I get some time during the weekend , I will try and put this into a document and see if it helps Raj