date:20101211

Re: exceptions copying files into HDFS

2010-12-11 Thread Sanford Rockowitz

On 12/11/2010 10:48 PM, Varadharajan Mukundan wrote: Hi, org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/rock/input/fair-scheduler.xml could only be replicated to 0 nodes, instead of 1 I think none of your datanodes are actually running. why not use jps and make sure whe

Re: exceptions copying files into HDFS

2010-12-11 Thread li ping

That's right. You have to make sure the datanode is running. If you are using the virtual machine, like Virtual-box, sometime, you should wait for a moment until the datanode is active. seems like the performance issue, the datanode in vm will be active after several mins. On Sun, Dec 12, 2010 at

Re: exceptions copying files into HDFS

2010-12-11 Thread Varadharajan Mukundan

Hi, > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /user/rock/input/fair-scheduler.xml could only be replicated to 0 nodes, > instead of 1 I think none of your datanodes are actually running. why not use jps and make sure whether they are running. Also check the datanode log

exceptions copying files into HDFS

2010-12-11 Thread Sanford Rockowitz

Folks, I'm a Hadoop newbie, and I hope this is an appropriate place to post this question. I'm trying to work through the initial examples. When I try to copy files into HDFS, hadoop throws exceptions. I imagine it's something in my configuration, but I'm at a loss to figure out what. I

Re: Error: ... It is indirectly referenced from required .class files - implements

2010-12-11 Thread Harsh J

Try adding the commons-logging jar to your build path. It is available in the lib/ folder of your Hadoop distribution. If you use the MapReduce eclipse plugin which comes with the Hadoop distro, it would add all required jars to create a Hadoop project automatically (i.e. everything in lib/*.jar +

Re: Error: ... It is indirectly referenced from required .class files - implements

2010-12-11 Thread maha

This is a compilation error I get in Eclipse ... So, I don't so how putting the hadoop-core.jar in the lib/ directory will change the error. Do you suggest another way of running a hadoop java program ? The way I do it is .. Create an Eclipse project , Build Paths --> Add External Archives: had

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

2010-12-11 Thread Edward Choi

Thanks. Then I should definitely try that. Thanks for all the info :-) Ed From mp2893's iPhone On 2010. 12. 12., at 오전 3:00, Ted Dunning wrote: > Of course. It is just a set of Hadoop programs. > > 2010/12/11 edward choi > >> Can I operate Bixo on a cluster other than Amazon EC2? >>

Re: Error: ... It is indirectly referenced from required .class files - implements

2010-12-11 Thread li ping

Can you try to add the jar file in your Hadoop lib directory. On Sun, Dec 12, 2010 at 8:00 AM, Maha A. Alabduljalil wrote: > > Hi all, > > I extended my project path with the hadoop-0.20.2-core.jar file, but I can > see that some of the classes I need aren't there, so for example an error I > ge

Error: ... It is indirectly referenced from required .class files - implements

2010-12-11 Thread Maha A. Alabduljalil

Hi all, I extended my project path with the hadoop-0.20.2-core.jar file, but I can see that some of the classes I need aren't there, so for example an error I get: " The type org.apache.commons.logging.Log cannot be resolved. It is indirectly referenced from required .class files- i

Re: Slow final few reducers

2010-12-11 Thread Ted Dunning

The job history program tells you this. The syntax is hideous, but there is a parser provided. On Sat, Dec 11, 2010 at 8:23 AM, Mithila Nagendra wrote: > Just curious and off topic :) How do you find the time taken by each > reducer? What command/method do you use? I need that for my research.

Re: Slow final few reducers

2010-12-11 Thread Ted Dunning

It sounds like your key distribution is being reflected in the size of your reduce tasks, thus making some of them take much longer than the rest. There are three solutions to this: a) down-sample. Particularly for statistical computations, once you have seen a thousand instances, you have seen

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

2010-12-11 Thread Ted Dunning

Of course. It is just a set of Hadoop programs. 2010/12/11 edward choi > Can I operate Bixo on a cluster other than Amazon EC2? >

Re: Slow final few reducers

2010-12-11 Thread Mithila Nagendra

Hi Rob, Just curious and off topic :) How do you find the time taken by each reducer? What command/method do you use? I need that for my research. Thanks, Mithila On Sat, Dec 11, 2010 at 4:05 AM, Rob Stewart wrote: > Hi, > > I have a problem with a MapReduce job I am trying to run on a 32 node

Re: Slow final few reducers

2010-12-11 Thread Harsh J

On Sat, Dec 11, 2010 at 7:41 PM, Rob Stewart wrote: > Sorry my fault - It's someone running a network simulator on the cluster ! > Culprit found? *wide grin* -- Harsh J www.harshj.com

Re: Slow final few reducers

2010-12-11 Thread Rob Stewart

Sorry my fault - It's someone running a network simulator on the cluster ! Rob On 11 December 2010 14:09, Rob Stewart wrote: > OK, slight update: > > Immediately underneath public void reduce(), I have added a: > System.out.println("Key: " + key.toString()); > > And I am logged on a node that is

Re: Slow final few reducers

2010-12-11 Thread Rob Stewart

OK, slight update: Immediately underneath public void reduce(), I have added a: System.out.println("Key: " + key.toString()); And I am logged on a node that is still working on a reducer. However, it stopped printing "Key:" long ago, so it is not processing new keys. But looking more closely at

Re: Slow final few reducers

2010-12-11 Thread Harsh J

On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart wrote: > Oh, > > I should add, of the Java processes running on the remaining nodes for > the final wave of reducers, the one taking all the CPU is the "Child" > process (not TaskTracker). I log into the Master, and also, the Java > process taking all t

Re: Multicore Nodes

2010-12-11 Thread Harsh J

On Sat, Dec 11, 2010 at 5:10 PM, Rob Stewart wrote: > Ah, > > that is very interesting indeed. > > I am running on a homogeneous cluster, where each node has 8 cores. > > Does that mean that Hadoop would need to be carefully configured, so > that 8 core machines had a max.tasks value of 8, and dua

Re: Slow final few reducers

2010-12-11 Thread Rob Stewart

Oh, I should add, of the Java processes running on the remaining nodes for the final wave of reducers, the one taking all the CPU is the "Child" process (not TaskTracker). I log into the Master, and also, the Java process taking all the CPU is "Child". Is this normal? thanks, Rob On 11 December

Re: Multicore Nodes

2010-12-11 Thread Rob Stewart

Ah, that is very interesting indeed. I am running on a homogeneous cluster, where each node has 8 cores. Does that mean that Hadoop would need to be carefully configured, so that 8 core machines had a max.tasks value of 8, and dual core machines had the value 2 ? Very useful to know, Rob On 1

Re: Slow final few reducers

2010-12-11 Thread Rob Stewart

Hi, many thanks for your response. A few observations: - I know that for a fact my key distribution is quite radically skewed (some keys with *many* value, most keys with few). - I have overlooked the fact that I need a partitioner. I suspect that this will help dramatically. I realize that the n

Re: Multicore Nodes

2010-12-11 Thread Harsh J

Hi, On Sat, Dec 11, 2010 at 4:39 PM, Rob Stewart wrote: > Hi, > > When trying to compare Hadoop against other parallel paradigms, it is > important to consider heterogeneous systems. Some may have 100 nodes, > each single core. Some may have 100 nodes, with 8 cores on each, and > others may have

Re: Slow final few reducers

2010-12-11 Thread Harsh J

Hi, Certain reducers may receive a higher share of data than others (Depending on your data/key distribution, the partition function, etc.). Compare the longer reduce tasks' counters with the quicker ones. Are you sure that the reducers that take long are definitely the last wave, as in with IDs

Multicore Nodes

2010-12-11 Thread Rob Stewart

Hi, When trying to compare Hadoop against other parallel paradigms, it is important to consider heterogeneous systems. Some may have 100 nodes, each single core. Some may have 100 nodes, with 8 cores on each, and others may have 5 nodes, 32 cores per node. As Hadoop runs on JVM's on each node, a

Slow final few reducers

2010-12-11 Thread Rob Stewart

Hi, I have a problem with a MapReduce job I am trying to run on a 32 node cluster. The final few reducers take a *lot* longer than the rest. e.g. If I specify 100 reducers, the first 90 will complete in 5 minutes, and then the remaining 10 reducers might take 10 minutes. Same is true for any num

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

2010-12-11 Thread edward choi

Excuse me but could I ask one more question? Can I operate Bixo on a cluster other than Amazon EC2? I already am running a Hadoop cluster of my own, so I'd like run Bixo on top of my cluster. But I don't see how to do it in the Bixo's "Getting Started" page. All I see are "running locally", "runnin

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

2010-12-11 Thread edward choi

I'd start with only a few rss feeds at first, but I plan to expand it to the scale of a thousands of rss feeds every 30 minutes eventually. That's why I am so eager to implement my system in Hadoop. I skimmed through Nutch and Bixo but I feel that eventually I'm gonna have to build the system from

Re: exceptions copying files into HDFS

Re: exceptions copying files into HDFS

Re: exceptions copying files into HDFS

exceptions copying files into HDFS

Re: Error: ... It is indirectly referenced from required .class files - implements

Re: Error: ... It is indirectly referenced from required .class files - implements

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

Re: Error: ... It is indirectly referenced from required .class files - implements

Error: ... It is indirectly referenced from required .class files - implements

Re: Slow final few reducers

Re: Slow final few reducers

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

Re: Slow final few reducers

Re: Slow final few reducers

Re: Slow final few reducers

Re: Slow final few reducers

Re: Slow final few reducers

Re: Multicore Nodes

Re: Slow final few reducers

Re: Multicore Nodes

Re: Slow final few reducers

Re: Multicore Nodes

Re: Slow final few reducers

Multicore Nodes

Slow final few reducers

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?

27 matches

Site Navigation

Mail list logo

Footer information