Re: namenode errors

2009-09-21 Thread Zeev Milin
Thanks Brian. Killing an orphan process on one of the nodes resolved the issue (I did not capture a stack trace) Zeev On Wed, Sep 16, 2009 at 5:47 AM, Brian Bockelman wrote: > Hey Zeev, > > This is caused by a misbehaving client stuck in an infinite loop. When you > restart the NN, the client

Cascading Meetup at Rapleaf September 24th

2009-09-21 Thread Chris K Wensel
Hi all, RapLeaf is hosting a Cascading meetup on September 24th. More details at: http://blog.rapleaf.com/dev/?p=196 and http://upcoming.yahoo.com/event/4421260 Hope to see you there! chris -- Chris K Wensel ch...@concurrentinc.com http://www.concurrentinc.com

RE: forrest configuration....

2009-09-21 Thread Andy Sautins
That would be my problem. Thanks. I was passing it Java 6. -Original Message- From: Matt Massie [mailto:m...@cloudera.com] Sent: Monday, September 21, 2009 2:23 PM To: common-user@hadoop.apache.org Cc: core-u...@hadoop.apache.org Subject: Re: forrest configuration What are you

Re: forrest configuration....

2009-09-21 Thread Matt Massie
What are you passing in for java5.home? Forrest requires Java 5 to validate the sitemap. It will fail if you try to use Java 6. -Matt On Mon, Sep 21, 2009 at 1:20 PM, Andy Sautins wrote: > > I can't seem to find documentation on how to configure forrest to build. > When trying to build hdfs

forrest configuration....

2009-09-21 Thread Andy Sautins
I can't seem to find documentation on how to configure forrest to build. When trying to build hdfs/site/build.xml I get the following error: [exec] validate-sitemap: [exec] /home/user/apache-forrest-0.8/main/webapp/resources/schema/relaxng/sitemap-v06.rng:72:31: error: datatype li

Re: Cluster gets overloaded processing large files via streaming

2009-09-21 Thread paul
Don't forget that the records are sorted going into the reducer. This is often overlooked by new users that are just using pipes on the command line to test their perl mappers and reducers without sorted data. For my perl streaming applications, I perform all of my operations on my values, like

SequenceFileAsBinaryOutputFormat for M/R

2009-09-21 Thread Bill Habermaas
Referring to Hadoop 0.20.1 API. SequenceFileAsBinaryOutputFormat requires JobConf but JobConf is deprecated. Is there another OutputFormat I should be using ? Bill

Re: Cluster gets overloaded processing large files via streaming

2009-09-21 Thread Alex McLintock
I think the default chunk size you are referring to is about 64Mb. This was chosen as something like a single read off a disk. I for one am a big perl fan but I am not happy about 64Mb of text being read into a perl hash. Hashes are memory wasteful - in preference to speed. So my verdict is to ret

Cluster gets overloaded processing large files via streaming

2009-09-21 Thread Leo Alekseyev
Hi all, I have a streaming job running on ~300 GB of ASCII data in 3 large files, where the mapper and reducer are Perl scripts. Mapper does trivial data cleanup, and reducer builds a hash then iterates over this hash writing output. Hash key is the first field in the data, i.e. the same as the s

JobConf in .20

2009-09-21 Thread Mark Kromer
I didn't see anything about this in the archive, so perhaps I'm doing something wrong, but I have run into a problem creating a job with the .20 release without using the deprecated JobConf class. The mapreduce.JobContext class is the replacement for the deprecated mapred.JobContext, but it co

Re: Can not stop hadoop cluster ?

2009-09-21 Thread Allen Wittenauer
On 9/20/09 10:19 PM, "Jeff Zhang" wrote: > But it's weired that it shows I can not stop the cluster. Does anyone > encounter this problem before ? > > Any ideas ? This is the message when I run command bin/stop-all.sh > > no jobtracker to stop All the time. For us, the result was the $USER e

Re: Re: Processing a large quantity of smaller XML files?

2009-09-21 Thread Andrzej Jan Taramina
Brian (and others): Great info...thanks! > I would suggest looking into at Cloudera's blog posting about the "small > files problem": > > http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/ Good link...muchos gracias. > The simplest thing you could do is to use the Hadoop ARchive

Re: HADOOP-4539 question

2009-09-21 Thread Todd Lipcon
On Mon, Sep 21, 2009 at 7:50 AM, Edward Capriolo wrote: > > > >Storing the only copy of the NN data into NFS would make the NFS server an > > SPOF, and you still need to solve the problems of > > @Steve correct. It is hair splitting but Stas asked if there was an > approach that did not use DRBD.

Re: Using ArrayWritable as a key?

2009-09-21 Thread Lajos
Apologies, I should'a checked the source first ... I see that keys have to be WritableComparable, and hence I'll have to implement that interface in my custom class. Lajos Lajos wrote: Hi all, I seem to have a problem using ArrayWritable (of Texts) as a key in my MR jobs. I want my Mapper

Re: Can not stop hadoop cluster ?

2009-09-21 Thread Todd Lipcon
On Mon, Sep 21, 2009 at 2:57 AM, Steve Loughran wrote: > Jeff Zhang wrote: > >> My cluster has running for several months. >> > > Nice. > > Is this a bug of hadoop? I think hadoop is supposed to run for long time. >> > > I'm doing work in HDFS-326 on making it easier to start/stop the various >

Hypertable binary packages available

2009-09-21 Thread Doug Judd
Hypertable (www.hypertable.org) is an open source C++ implementation of Bigtable which runs on top of HDFS. Binary packages (RPM, debian, dmg) for Hypertable are now available and can be downloaded here: http://package.hypertable.org/ Updated documentation, with a "Getting Started" guide, can be

Re: Using ArrayWritable as a key?

2009-09-21 Thread Todd Lipcon
Hi Lajos, ArrayWritable does not implement WritableComparable, so it can't currently be used as a mapper output key - those keys have to be sorted during the shuffle, and thus the type must be WritableComparable. -Todd On Mon, Sep 21, 2009 at 8:53 AM, Lajos wrote: > Hi all, > > I seem to have

Using ArrayWritable as a key?

2009-09-21 Thread Lajos
Hi all, I seem to have a problem using ArrayWritable (of Texts) as a key in my MR jobs. I want my Mapper output key to be ArrayWritable, and both input & output keys in my Reducer the same. I've tried this with both mapred and mapreduce versions (I'm using 0.20.0 here). I also tried extend

Re: HADOOP-4539 question

2009-09-21 Thread Edward Capriolo
On Mon, Sep 21, 2009 at 6:03 AM, Steve Loughran wrote: > Edward Capriolo wrote: > >> >> Just for reference. Linux HA and some other tools deal with the split >> brain decisions by requiring a quorum. A quorum involves having a >> third party or having more then 50% of the nodes agree. >> >> An iss

Re: HADOOP-4539 question

2009-09-21 Thread Stas Oskin
Hi. Just wanted to reflect my thoughts on this: So far DRBD looks as a good enough solution. My only problem, is that it requires from me to operate dedicate machines (physical or virtual) for Hadoop Namenode, in active/passive configuration. I'm interesting in HADOOP-4539 mostly because it woul

Re: HADOOP-4539 question

2009-09-21 Thread Steve Loughran
Edward Capriolo wrote: Just for reference. Linux HA and some other tools deal with the split brain decisions by requiring a quorum. A quorum involves having a third party or having more then 50% of the nodes agree. An issue with linux-ha and hadoop is that linux-ha is only supported/tested on

Re: Can not stop hadoop cluster ?

2009-09-21 Thread Steve Loughran
Jeff Zhang wrote: My cluster has running for several months. Nice. Is this a bug of hadoop? I think hadoop is supposed to run for long time. I'm doing work in HDFS-326 on making it easier to start/stop the various hadoop services; once the lifecycle stuff is in I'll worry more about the r

Re: Can not stop hadoop cluster ?

2009-09-21 Thread David B. Ritch
It's not precisely a bug in anything - rather, its a hadoop default configuration that is rather peculiar. The process ID (pid) is kept in a pid file. The default location of that file is set in hadoop-default.xml, and should be overridden in hadoop-site.xml. The problem is that the default loca

Re: Can not stop hadoop cluster ?

2009-09-21 Thread Anthony Urso
It has nothing to do with Hadoop, it has to do with tmpwatch. Kill the processes nicely and you won't lose any data. Cheers, Anthony On Mon, Sep 21, 2009 at 1:39 AM, Chandraprakash Bhagtani wrote: > no you won't loose data, as you are only killing process, which you can > restart later. > > On

Re: Can not stop hadoop cluster ?

2009-09-21 Thread Chandraprakash Bhagtani
no you won't loose data, as you are only killing process, which you can restart later. On Mon, Sep 21, 2009 at 12:15 PM, Jeff Zhang wrote: > My cluster has running for several months. > > Is this a bug of hadoop? I think hadoop is supposed to run for long time. > > And will I lose data if I manu