What's the proper way to use hadoop task side-effect files?

2008-04-10 Thread Zhang, jian
Hi, I was new to hadoop. Sorry for my novice question. I got some problem while I was trying to use task side-effect files. Since there is no code example in wiki, I tried this way: I override cofigure method in reducer to create a side file, public void configure(JobConf conf){

Re: Hadoop performance on EC2?

2008-04-10 Thread Ted Dziuba
I have seen EC2 be slower than a comparable system in development, but not by the factors that you're experiencing. One thing about EC2 that has concerned me - you are not guaranteed that your "/mnt" disk is an uncontested spindle. Early on, this was the case, but Amazon made no promises. A

Re: how to set logging level to debug

2008-04-10 Thread lohit
You could use hadoop daemonlog to get and set LOG levels to set FSNameSystem to DEBUG you would do something like this >hadoop daemonlog -setlevel namenode:50070 org.apache.hadoop.dfs.FSNameSystem >DEBUG Thanks, Lohit - Original Message From: Cagdas Gerede <[EMAIL PROTECTED]> To: core-

Re: Hadoop performance on EC2?

2008-04-10 Thread Nate Carlson
On Thu, 10 Apr 2008, Ted Dunning wrote: Are you trying to read from mySQL? No, we're outputting to MySQL. I've also verified that the MySQL server is hardly seeing any load, isn't waiting on slow queries, etc. If so, it isn't very surprising that you could get lower performance with more re

Re: "could only be replicated to 0 nodes, instead of 1"

2008-04-10 Thread Jayant Durgad
I am faced with the exact same problem described here, does anybody know how to resolve this?

Re: Headers and footers on Hadoop output results

2008-04-10 Thread Riccardo Boscolo
You can write your own output format (extending TextOutputFormat) and simply add the header when you create the output file, and the footer right before you close it. RB On Tue, Apr 8, 2008 at 12:42 PM, ncardoso <[EMAIL PROTECTED]> wrote: > > Hello. > > I'm using Hadoop to process several XML fi

Re: Does any one tried to build Hadoop..

2008-04-10 Thread krishna prasanna
I Tried in both ways i am still i am getting some errors --- import org.apache.tools.ant.BuildException; (error: cannot be resolved..) --- public Socket createSocket() throws IOException { --- s = socketFactory.createSocket(); (error: incorrect parameters) earlier it failed to resolve this pack

Re: hdfs > 100T?

2008-04-10 Thread Ted Dunning
I should mention that the mogile available generally is not suitable for large installs. We had to make significant changes to get it to work correctly. We are figuring out how to contribute these back, but may have to fork the project to do it. On 4/10/08 12:21 PM, "Todd Troxell" <[EMAIL PROT

Re: RAID-0 vs. JBOD?

2008-04-10 Thread Raghu Angadi
Ted Dunning wrote: I haven't done a detailed comparison, but I have seen some effects: A) raid doesn't usually work really well on low-end machines compared to independent drives. This would make me distrust raid. B) hadoop doesn't do very well, historically speaking with more than one parti

Re: hdfs > 100T?

2008-04-10 Thread Todd Troxell
On Thu, Apr 10, 2008 at 09:18:02AM -0700, Ted Dunning wrote: > Hadoop also does much better with spindles spread across many machines. > Putting 16 TB on each of two nodes is distinctly sub-optimal on many fronts. > Much better to put 0.5-2TB on 16-64 machines. With 2x1TB SATA drives, your > cost

Re: RAID-0 vs. JBOD?

2008-04-10 Thread Ted Dunning
I haven't done a detailed comparison, but I have seen some effects: A) raid doesn't usually work really well on low-end machines compared to independent drives. This would make me distrust raid. B) hadoop doesn't do very well, historically speaking with more than one partition if the partition

RAID-0 vs. JBOD?

2008-04-10 Thread Colin Evans
We're building a cluster of 40 machines with 5 drives each, and I'm curious what people's experiences have been for using RAID-0 for HDFS vs. configuring seperate partitions (JBOD) and having the datanode balance between them. I took a look at the datanode code, and datanodes appear to write b

Re: hdfs > 100T?

2008-04-10 Thread Ted Dunning
Hadoop also does much better with spindles spread across many machines. Putting 16 TB on each of two nodes is distinctly sub-optimal on many fronts. Much better to put 0.5-2TB on 16-64 machines. With 2x1TB SATA drives, your cost and performance are likely to both be better than two machines with

Re: Hadoop performance on EC2?

2008-04-10 Thread Ted Dunning
Are you trying to read from mySQL? If so, it isn't very surprising that you could get lower performance with more readers. On 4/9/08 7:07 PM, "Nate Carlson" <[EMAIL PROTECTED]> wrote: > Hey all, > > We've got a job that we're running in both a development environment, and > out on EC2. I've

Re: Counters giving double values

2008-04-10 Thread rude
Hello list readers, i'm still looking for an explanation. maybe i put it into this way: i run a job and at the end i need to know how many instances of a specific type were written into an output file. i wanted to rely on the counters. but maybe this is not a good idea anyway. or what is the o

Re: Formatting the file system: Misleading hint in Wiki?

2008-04-10 Thread Colin Freas
This has been my experience as well. This should be mentioned in the Getting Started pages until resolved. -colin On Thu, Apr 10, 2008 at 10:54 AM, Michaela Buergle < [EMAIL PROTECTED]> wrote: > Hi all, > on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says: > "Do not format a

Formatting the file system: Misleading hint in Wiki?

2008-04-10 Thread Michaela Buergle
Hi all, on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says: "Do not format a running Hadoop filesystem, this will cause all your data to be erased." It seems to me however that currently you better not format a Hadoop filesystem at all (after the first time, that is), running or n

Re: Sorting the OutputCollector

2008-04-10 Thread Owen O'Malley
On Apr 9, 2008, at 7:30 AM, Aayush Garg wrote: But the problem is that I need to sort according to freq which is the part of my value field... Any inputs?? Could you provide smal piece of code of your thought The second job would use the InverseMapper from org.apache.hadoop.mapred.lib.Inv

Re: Does any one tried to build Hadoop..

2008-04-10 Thread Jean-Daniel Cryans
At the root of the source and it's called build.xml Jean-Daniel 2008/4/9, Khalil Honsali <[EMAIL PROTECTED]>: > > Mr. Jean-Daniel, > > where is the ant script please? > > > On 10/04/2008, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > > > > The ANT script works well also. > > > > Jean-Daniel > >

Re: hdfs > 100T?

2008-04-10 Thread Todd Troxell
On Thu, Apr 10, 2008 at 09:47:59AM +0200, Mads Toftum wrote: > On Wed, Apr 09, 2008 at 09:42:36PM -0500, Todd Troxell wrote: > > I was unable to access the archives for this list as > > http://hadoop.apache.org/mail/core-user/ returns 403. > > > You're probably looking for > http://mail-archives.a

Re: hdfs > 100T?

2008-04-10 Thread Allen Wittenauer
On 4/10/08 4:42 AM, "Todd Troxell" <[EMAIL PROTECTED]> wrote: > Hello list, Howdy. > I am interested in using HDFS for storage, and for map/reduce only > tangentially. I see clusters mentioned in the docs with many many nodes and > 9TB of disk. > > Is HDFS expected to scale to > 100TB?

Re: hdfs > 100T?

2008-04-10 Thread Mads Toftum
On Wed, Apr 09, 2008 at 09:42:36PM -0500, Todd Troxell wrote: > I was unable to access the archives for this list as > http://hadoop.apache.org/mail/core-user/ returns 403. > You're probably looking for http://mail-archives.apache.org/mod_mbox/hadoop-core-user/ vh Mads Toftum -- http://soulfood