Re: Stackoverflow

2008-06-02 Thread Chris Douglas
I have no Java implementation of my job, sorry. Since it's all in the map side, IdentityMapper/IdentityReducer is fine, as long as both the splits and the number of reduce tasks are the same. The data is a representation for loglines, and not exactly small, e.g. the stuff has already be

Re: Stackoverflow

2008-06-02 Thread Andreas Kostyrka
On Tuesday 03 June 2008 04:53:22 Chris Douglas wrote: > Is anyone observing this outside of streaming? > > We've been able to reproduce this trace with a bad comparator that > only returns negative values, but haven't found any uncontrived > patterns in data that produce this, nor any comparators i

Re: Matrix Multiplication Problem

2008-06-02 Thread Edward J. Yoon
Hi Hama project has accepted into the Apache Incubator, and There's a work on. Please subscribe the mailing list. ([EMAIL PROTECTED]) Regards, Edward. > On Mon, Jun 2, 2008 at 9:11 PM, Hadoop <[EMAIL PROTECTED]> wrote: > > I downloaded the Matrix Multiplication code from: > http://code.google.co

creating less than 10G data with RandomWriter

2008-06-02 Thread Richard Zhang
Hello Hadoopers: I am running the RandomWrite on a 8 nodes cluster. Because the default setting is creating 1G/mapper, 10mappers/host. Considering replications, it is essentially creating 30G/host. Because each node in the cluster has at most 30G. So my cluster is full and can not execute further c

create less than 10G data/host with RandomWrite

2008-06-02 Thread Richard Zhang
Hello Hadoopers: I am running the RandomWrite on a 8 nodes cluster. Because the default setting is creating 1G/mapper, 10mappers/host. Considering replications, it is essentially creating 30G/host. Because each node in the cluster has at most 30G. So my cluster is full and can not execute further c

Re: DataNode often self-stopped

2008-06-02 Thread Konstantin Shvachko
> No , it is in different storage file. What is in different storage file? All data-nodes should have different configuration files, and each configuration file should set a different storage directory property: "dfs.data.dir" It is not a file, it is directory with all data-blocks. > the data-n

Re: DataNode often self-stopped

2008-06-02 Thread smallufo
2008/6/3 Konstantin Shvachko <[EMAIL PROTECTED]>: > Is it possible that your different data-nodes point to the same storage > directory on > the hard drive? If so one of the data-nodes will be shut down. > In general this is impossible because storage directories are locked once > one of the nod

Re: Hadoop installation folders in multiple nodes

2008-06-02 Thread Michael Di Domenico
Oops, missed the part where you already tried that. On Mon, Jun 2, 2008 at 3:23 PM, Michael Di Domenico <[EMAIL PROTECTED]> wrote: > Depending on your windows version, there is a dos command called "subst" > which you could use to virtualize a drive letter on your third machine > > > On Fri, May

Re: Hadoop installation folders in multiple nodes

2008-06-02 Thread Michael Di Domenico
Depending on your windows version, there is a dos command called "subst" which you could use to virtualize a drive letter on your third machine On Fri, May 30, 2008 at 4:35 AM, Sridhar Raman <[EMAIL PROTECTED]> wrote: > Should the installation paths be the same in all the nodes? Most > documenta

RE: Stack Overflow When Running Job

2008-06-02 Thread Devaraj Das
Hi, do you have a testcase that we can run to reproduce this? Thanks! > -Original Message- > From: jkupferman [mailto:[EMAIL PROTECTED] > Sent: Monday, June 02, 2008 9:22 AM > To: core-user@hadoop.apache.org > Subject: Stack Overflow When Running Job > > > Hi everyone, > I have a job ru

Re: hadoop on EC2

2008-06-02 Thread Chris K Wensel
obviously this isn't the best solution if you need to let many semi trusted users browse your cluster. Actually, it would be much more secure if the tunnel service ran on a trusted server letting your users connect remotely via SOCKS and then browse the cluster. These users wouldn't need

Re: DataNode often self-stopped

2008-06-02 Thread Konstantin Shvachko
Is it possible that your different data-nodes point to the same storage directory on the hard drive? If so one of the data-nodes will be shut down. In general this is impossible because storage directories are locked once one of the nodes claims them under its authority. But I don't know whether

Re: hadoop on EC2

2008-06-02 Thread Chris K Wensel
if you use the new scripts in 0.17.0, just run > hadoop-ec2 proxy this starts a ssh tunnel to your cluster. installing foxy proxy in FF gives you whole cluster visibility.. obviously this isn't the best solution if you need to let many semi trusted users browse your cluster. On May 28, 20

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-02 Thread Steve Loughran
Alejandro Abdelnur wrote: Yes you would have to do it with classloaders (not 'hello world' but not 'rocket science' either). That's where we differ. I do actually think that classloaders are incredibly hard to get right, and I say that as someone who has single stepped through the Axis2 code

Re: About Metrics update

2008-06-02 Thread lohit
In MetricsIntValue, incrMetrics() was being called on pushMetrics(), instead of setMetrics(). This used to cause the values to be incremented periodically. Thanks, Lohit - Original Message From: Ion Badita <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Saturday, May 31, 2008 4

Re: Text file character encoding

2008-06-02 Thread Ted Dunning
You should file a Jira, make the change and submit a patch! On Sun, Jun 1, 2008 at 11:19 PM, NOMURA Yoshihide <[EMAIL PROTECTED]> wrote: > Hello, > I'm using Hadoop 0.17.0 to analyze some large amount of CSV files. > > And I need to read such files in different character encoding from UTF-8, > bu

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-02 Thread Alejandro Abdelnur
Yes you would have to do it with classloaders (not 'hello world' but not 'rocket science' either). You'll be limited on using native libraries, even if you use classloaders properly as native libs can be loaded only once. You will have to ensure you get rid of the task classloader once the task i

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-02 Thread Christophe Taton
Hi Steve, On Mon, Jun 2, 2008 at 12:23 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Christophe Taton wrote: > >> Actually Hadoop could be made more friendly to such realtime Map/Reduce >> jobs. >> For instance, we could consider running all tasks inside the task tracker >> jvm as separate thre

Matrix Multiplication Problem

2008-06-02 Thread Hadoop
I downloaded the Matrix Multiplication code from: http://code.google.com/p/hama/source/browse/trunk/src/java/org/apache/hama/ but I do not know how can I run it in the right way. Could you please give steps how to run the code? -- View this message in context: http://www.nabble.com/Matrix-Mult

DataNode often self-stopped

2008-06-02 Thread smallufo
Hi I am simulating a 4-DataNodes environment using VMWare. I found some data nodes often self-stopped after receiving a large file (or block). In fact , not so large , it is just smaller than 10MB. This is the error messages : 2008-05-27 16:40:54,727 INFO org.apache.hadoop.dfs.DataNode: Received

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-02 Thread Steve Loughran
Christophe Taton wrote: Actually Hadoop could be made more friendly to such realtime Map/Reduce jobs. For instance, we could consider running all tasks inside the task tracker jvm as separate threads, which could be implemented as another personality of the TaskRunner. I have been looking into th

MetricsIntValue/MetricsLongValue publish once

2008-06-02 Thread Ion Badita
Hi, In javadoc for MetricsIntValue and MetricsLongValue is written: "Each time its value is set, it is published only *once* at the next update call". Looking at the those classes is right they "push" the data into the MetricsRecord only once, but digging dipper into the AbstractMericsContext

Re: distcp/ls fails on Hadoop-0.17.0 on ec2.

2008-06-02 Thread Einar Vollset
Hi Tom. Ah... From reading (your?) article: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112 I got confused; it seems to suggest that distcp is used to move ordinary S3 objects onto HDFS.. Thanks for the clarification. Cheers, Einar On Sat, May 31, 200

Using hadoop to store large backups

2008-06-02 Thread Greg Connor
I'm starting to use Hadoop as a simple "storage pool" to store backups of large things (currently Oracle database backups). My Hadoop usage is at a pretty primitive level so far and I am really only scratching the surface of what it can do. I haven't used map/reduce at all--so far it's just be