How to configure SWIM

2012-03-01 Thread Arvind
Hi all, Can anybody help me to configure SWIM -- Statistical Workload Injector for MapReduce on my hadoop cluster

Re: "Browse the filesystem" weblink broken after upgrade to 1.0.0: HTTP 404 "Problem accessing /browseDirectory.jsp"

2012-03-01 Thread madhu phatak
On Wed, Feb 29, 2012 at 11:34 PM, W.P. McNeill wrote: > I can do perform HDFS operations from the command line like "hadoop fs -ls > /". Doesn't that meant that the datanode is up? > No. It is just meta data lookup which comes from Namenode. Try to cat some file like "hadoop fs -cat " . Then i

Distributed Indexing on MapReduce

2012-03-01 Thread Frank Scholten
Hi all, I am looking into reusing some existing code for distributed indexing to test a Mahout tool I am working on https://issues.apache.org/jira/browse/MAHOUT-944 What I want is to index the Apache Public Mail Archives dataset (200G) via MapReduce on Hadoop. I have been going through the Nutch

Re: Streaming Hadoop using C

2012-03-01 Thread Charles Earl
How was your experience of starfish? C On Mar 1, 2012, at 12:35 AM, Mark question wrote: > Thank you for your time and suggestions, I've already tried starfish, but > not jmap. I'll check it out. > Thanks again, > Mark > > On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl wrote: > >> I assume you ha

Re: Should splittable Gzip be a "core" hadoop feature?

2012-03-01 Thread Michel Segel
I do agree that a git hub project is the way to go unless you could convince Cloudera, HortonWorks or MapR to pick it up and support it. They have enough committers Is this potentially worthwhile? Maybe, it depends on how the cluster is integrated in to the overall environment. Companies th

fairscheduler : group.name doesn't work, please help

2012-03-01 Thread Austin Chungath
I am running fair scheduler on hadoop 0.20.205.0 http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html The above page talks about the following property *mapred.fairscheduler.poolnameproperty* ** which I can set to *group.name* The default is user.name and when a user submits a jo

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Merto Mertek
>From the fairscheduler docs I assume the following should work: mapred.fairscheduler.poolnameproperty pool.name pool.name ${mapreduce.job.group.name} which means that the default pool will be the group of the user that has submitted the job. In your case I think that allocations

RE: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Dave Shine
I've just started playing with the Fair Scheduler. To specify the pool at job submission time you set the "mapred.fairscheduler.pool" property on the Job Conf to the name of the pool you want the job to use. Dave -Original Message- From: Merto Mertek [mailto:masmer...@gmail.com] Sent:

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
Thanks, I will be trying the suggestions and will get back to you soon. On Thu, Mar 1, 2012 at 8:09 PM, Dave Shine < dave.sh...@channelintelligence.com> wrote: > I've just started playing with the Fair Scheduler. To specify the pool at > job submission time you set the "mapred.fairscheduler.pool

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
Hi, I tried what you had said. I added the following to mapred-site.xml: mapred.fairscheduler.poolnameproperty pool.name pool.name ${mapreduce.job.group.name} Funny enough it created a pool with the name "${mapreduce.job.group.name}" so I tried ${mapred.job.group.name} and ${group.

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Merto Mertek
I think that ${user.name} variable is obtained from system proprietes class, where I can not find the group.name propriety, so probably it is not possible to create pools depending on the user group, despite in the

Re: fairscheduler : group.name doesn't work, please help

2012-03-01 Thread Harsh J
The group.name scheduler support was introduced in https://issues.apache.org/jira/browse/HADOOP-3892 but may have been broken by the security changes present in 0.20.205. You'll need the fix presented in https://issues.apache.org/jira/browse/MAPREDUCE-2457 to have group.name support. On Thu, Mar

kill -QUIT

2012-03-01 Thread Mohit Anchlia
When I try kill -QUIT for a job it doesn't send the stacktrace to the log files. Does anyone know why or if I am doing something wrong? I find the job using ps -ef|grep "attempt". I then go to logs/userLogs/job/attempt/

High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Is there a high quality version of the hadoop logo anywhere? Even the graphic presented on the Apache page itself suffers from dreadful jpeg artifacting. A google image search didn't inspire much hope on this issue (they all have the same low-quality jpeg appearance). I'm looking for good gra

Re: High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Sorry, false alarm. I was looking at the popup thumbnails in google image search. If I click all the way through, there are some high quality versions available. Why is the version on the Apache site (and the Wikipedia page) so poor? On Mar 1, 2012, at 14:09 , Keith Wiley wrote: > Is there

Re: High quality hadoop logo?

2012-03-01 Thread Owen O'Malley
On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley wrote: > Sorry, false alarm.  I was looking at the popup thumbnails in google image > search.  If I click all the way through, there are some high quality > versions available.  Why is the version on the Apache site (and the Wikipedia > page) so poor?

Re: Streaming Hadoop using C

2012-03-01 Thread Mark question
Starfish worked great for wordcount .. I didn't run it on my application because I have only map tasks. Mark On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl wrote: > How was your experience of starfish? > C > On Mar 1, 2012, at 12:35 AM, Mark question wrote: > > > Thank you for your time and sugges

Re: High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Excellent! Thank you. Sent from my phone, please excuse my brevity. Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com Owen O'Malley wrote: On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley wrote: > Sorry, false alarm. I was looking at the popup thum

Adding nodes

2012-03-01 Thread Mohit Anchlia
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ: http://wiki.apache.org/hadoop/FAQ 1. Update conf/slave 2. on the slave nodes start datanode and tasktracker 3. hadoop balancer Do I also need to run dfsadmin -refreshnodes?

Re: Adding nodes

2012-03-01 Thread Joey Echeverria
You only have to refresh nodes if you're making use of an allows file. Sent from my iPhone On Mar 1, 2012, at 18:29, Mohit Anchlia wrote: > Is this the right procedure to add nodes? I took some from hadoop wiki FAQ: > > http://wiki.apache.org/hadoop/FAQ > > 1. Update conf/slave > 2. on the s

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria wrote: > You only have to refresh nodes if you're making use of an allows file. > > Thanks does it mean that when tasktracker/datanode starts up it communicates with namenode using master file? Sent from my iPhone > > On Mar 1, 2012, at 18:29, Mohi

Re: Adding nodes

2012-03-01 Thread Joey Echeverria
Not quite. Datanodes get the namenode host from fs.defalt.name in core-site.xml. Task trackers find the job tracker from the mapred.job.tracker setting in mapred-site.xml. Sent from my iPhone On Mar 1, 2012, at 18:49, Mohit Anchlia wrote: > On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria wr

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
The master and slave files, if I remember correctly are used to start the correct daemons on the correct nodes from the master node. Raj > > From: Joey Echeverria >To: "common-user@hadoop.apache.org" >Cc: "common-user@hadoop.apache.org" >Sent: Thursday, Mar

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria wrote: > Not quite. Datanodes get the namenode host from fs.defalt.name in > core-site.xml. Task trackers find the job tracker from the > mapred.job.tracker setting in mapred-site.xml. > I actually meant to ask how does namenode/jobtracker know the

Re: Adding nodes

2012-03-01 Thread anil gupta
Whatever Joey said is correct for Cloudera's distribution. For same, I am not confident about other distribution as i haven't tried them. Thanks, Anil On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan wrote: > The master and slave files, if I remember correctly are used to start the > correct da

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
WHat Joey said is correct for both apache and cloudera distros. The DN/TT   daemons  will connect to the NN/JT using the config files. The master and slave files are used for starting the correct daemons. > > From: anil gupta >To: common-user@hadoop.apache.org;

Re: Adding nodes

2012-03-01 Thread Arpit Gupta
It is initiated by the slave. If you have defined files to state which slaves can talk to the namenode (using config dfs.hosts) and which hosts cannot (using property dfs.hosts.exclude) then you would need to edit these files and issue the refresh command.On Mar 1, 2012, at 5:35 PM, Mohit Anchlia w

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
Thanks all for the answers!! On Thu, Mar 1, 2012 at 5:52 PM, Arpit Gupta wrote: > It is initiated by the slave. > > If you have defined files to state which slaves can talk to the namenode > (using config dfs.hosts) and which hosts cannot (using > property dfs.hosts.exclude) then you would need

Re: Adding nodes

2012-03-01 Thread George Datskos
Mohit, New datanodes will connect to the namenode so thats how the namenode knows. Just make sure the datanodes have the correct {fs.default.dir} in their hdfs-site.xml and then start them. The namenode can, however, choose to reject the datanode if you are using the {dfs.hosts} and {dfs.ho

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Tried but still getting the error 0.4.15. Really lost with this. My hadoop release is 0.20.2 from more than a year ago. Could this be related to the problem? -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792484.html Sent from

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Harsh J
Marc, Was the lzo libs on your server upgraded to a higher version recently? Also, when you deployed a built copy of 0.4.15, did you ensure you replaced the older native libs for hadoop-lzo as well? On Fri, Mar 2, 2012 at 9:05 AM, Marc Sturlese wrote: > Tried but still getting the error 0.4.15.

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Yes, The steps I followed where: 1-Intall lzo 2.06 in a machine with the same kernel as my nodes. 2-Compile there lzo 0.4.15 (in /lib replaced cdh3u3 per my hadoop 0.20.2 release) 3-Replace hadoop-lzo-0.4.9.jar for the now compiled hadoop-lzo-0.4.15.jar in the hadoop lib directory of all my nodes a

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
I use to have 2.05 but now as I said I installed 2.06 -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792511.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Joey Echeverria
I know this doesn't fix lzo, but have you considered Snappy for the intermediate output compression? It gets similar compression ratios and compress/decompress speed, but arguably has better Hadoop integration. -Joey On Thu, Mar 1, 2012 at 10:01 PM, Marc Sturlese wrote: > I use to have 2.05 but

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Absolutely. In case I don't find the root of the problem soon I'll definitely try it. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792531.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Absolutely. In case I don't find the root of the problem soon I'll definitely try it. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792530.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Subir S
Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? Also there was a claim in our company that Pig performs better than Map Reduce jobs? Is this true? Are there any such benchmarks available Thanks, Subir

Re: Where Is DataJoinMapperBase?

2012-03-01 Thread madhu phatak
Hi, Please look inside $HADOOP_HOME/contrib/datajoin folder of 0.20.2 version. You will find the jar. On Sat, Feb 11, 2012 at 1:09 AM, Bing Li wrote: > Hi, all, > > I am starting to learn advanced Map/Reduce. However, I cannot find the > class DataJoinMapperBase in my downloaded Hadoop 1.0.0 an

Re: Reducer NullPointerException

2012-03-01 Thread madhu phatak
Hi, It seems like you trying to run only the reducer without a mapper. Can you share main() method code which you trying to run? On Mon, Jan 23, 2012 at 11:43 AM, burakkk wrote: > Hello everyone, > I have 3 server(1 master, 2 slave) and I installed cdh3u2 on each > server. I execute simple word

Re: DFSIO

2012-03-01 Thread madhu phatak
Hi, Only HDFS should be enough. On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do wrote: > hi all, > > in order to run DFSIO in my cluster, > do i need to run JobTracker, and TaskTracker, > or just running HDFS is enough? > > Many thanks, > Thanh > -- Join me at http://hadoopworkshop.eventbrite.com

Re: DFSIO

2012-03-01 Thread Harsh J
Madhu, That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR setup to use it. On Fri, Mar 2, 2012 at 11:07 AM, madhu phatak wrote: > Hi, >  Only HDFS should be enough. > > On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do wrote: > >> hi all, >> >> in order to run DFSIO in my cluster, >>

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Jie Li
Considering Pig essentially translates scripts into Map Reduce jobs, one can always write as good Map Reduce jobs as Pig does. You can refer to "Pig experience" paper to see the overhead Pig introduces, but it's been improved all the time. Btw if you really care about the performance, how you conf

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Harsh J
On Fri, Mar 2, 2012 at 10:18 AM, Subir S wrote: > Hello Folks, > > Are there any pointers to such comparisons between Apache Pig and Hadoop > Streaming Map Reduce jobs? I do not see why you seek to compare these two. Pig offers a language that lets you write data-flow operations and runs these st

Re: DFSIO

2012-03-01 Thread madhu phatak
Hi Harsha, Sorry i read DFSIO as DFS Input/Output which i thought reading and writing using HDFS API:) On Fri, Mar 2, 2012 at 12:32 PM, Harsh J wrote: > Madhu, > > That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR > setup to use it. > > On Fri, Mar 2, 2012 at 11:07 AM, madhu