Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar

2011-06-06 Thread praveenesh kumar
Hello guys..!!! I am currently working on Hbase 0.90.3 and Hadoop 0.20.2 Since this hadoop version does not support rsync hdfs.. so I copied the *hadoop-core-append jar* file from *hbase/lib* folder into*hadoop folder * and replaced it with* hadoop-0.20.2-core.jar* which was suggested in the

Why inter-rack communication in mapreduce slow?

2011-06-06 Thread elton sky
hello everyone, As I don't have experience with big scale cluster, I cannot figure out why the inter-rack communication in a mapreduce job is significantly slower than intra-rack. I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding capacity. Connected with 48 nodes with

Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hi, Not able to see my email in the mail archive..So sending it again...!!! Guys.. need your feedback..!! Thanks, Praveenesh -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Mon, Jun 6, 2011 at 12:09 PM Subject: Hadoop is not working after adding

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Steve Loughran
On 06/06/11 08:22, elton sky wrote: hello everyone, As I don't have experience with big scale cluster, I cannot figure out why the inter-rack communication in a mapreduce job is significantly slower than intra-rack. I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding

Re: Starting a Hadoop job outside the cluster

2011-06-06 Thread Steve Loughran
My Job submit code is http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/ something to run tool classes

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread elton sky
Thanks for reply, Steve, I totally agree benchmark is a good idea. But the problem is I don't have switch to play with rather than a small cluster. I am curious of this and post the question. Can some experienced ppl can share their knowledge with us? Cheers On Mon, Jun 6, 2011 at 7:28 PM,

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Joey Echeverria
Larger Hadoop installations are space dense, 20-40 nodes per rack. When you get to that density with multiple racks, it becomes expensive to buy a switch with enough capacity for all of the nodes in all of the racks. The typical solution is to install a switch per rack with uplinks to a core

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:18:45 -0400, dar...@ontrenet.com wrote: I never understood how hadoop can throttle an inter-rack fiber switch. Its supposed to operate on the principle of move-the-code to the data because of the I/O cost of moving the data, right? But what happens when a reducer on rack

HBase Web UI showing exception everytime I am running it

2011-06-06 Thread praveenesh kumar
Hello guys.. I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2 cluster I dnt know why its happening..onlye 1 time its running .. after that its not.. HBASE WEB URL is showing the following exception Why its happening... Please help..!! Thanks, Praveenesh HTTP ERROR

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hello guys.. Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? Thanks, Praveenesh On Mon, Jun 6, 2011 at 2:18 PM, praveenesh kumar praveen...@gmail.comwrote: Hi, Not able to

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread darren
I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop trackers get bottlenecked to resolve those dependencies? Again, this exposes the oddity of hadoop IMO, it tries to NOT be I/O bound, but seems its very I/O bound... sorry. not trying to shift the thread topic. On Mon, 06 Jun

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:26:11 -0400, dar...@ontrenet.com wrote: I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop trackers get bottlenecked to resolve those dependencies? Again, this exposes the oddity of hadoop IMO, it tries to NOT be I/O bound, but seems its very I/O

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread darren
Yeah, that's a good point. I wonder though, what the load on the tracker nodes (port et. al) would be if a inter-rack fiber switch at 10's of GBS' is getting maxed. Seems to me that if there is that much traffic being mitigate across racks, that the tracker node (or whatever node it is) would

moving of a block from one rack to another

2011-06-06 Thread George Kousiouris
Hi all, Does anyone know how you can force a block to move from one rack to another in Hadoop 0.20? Or if it is possible in another version? Thanks, George --

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:34:56 -0400, dar...@ontrenet.com wrote: Yeah, that's a good point. I wonder though, what the load on the tracker nodes (port et. al) would be if a inter-rack fiber switch at 10's of GBS' is getting maxed. Seems to me that if there is that much traffic being mitigate

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread elton sky
Thanks Joey, So the b/w is throttled by the core switch when many nodes are requesting traffic and the core switch can not keep up. It's only happens when the cluster is busy enough. On Mon, Jun 6, 2011 at 11:01 PM, Joey Echeverria j...@cloudera.com wrote: Larger Hadoop installations are space

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread darren
Yeah, the way you described it, maybe not. Because the hellabytes are all coming from one rack. But in reality, wouldn't this be more uniform because of how hadoop/hdfs work (distributed more evenly)? And if that is true, then for all the switched packets passing through the inter-rack switch, a

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Joey Echeverria
Most of the network bandwidth used during a MapReduce job should come from the shuffle/sort phase. This part doesn't use HDFS. The TaskTrackers running reduce tasks will pull intermediate results from TaskTrackers running map tasks over HTTP. In most cases, it's difficult to get rack locality

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread elton sky
Hi John, Because for map task, job tracker tries to assign them to local data nodes, so there' not much n/w traffic. Then the only potential issue will be, as you said, reducers, which copies data from all maps. So in other words, if the application only creates small intermediate output, e.g.

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com wrote: Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? It would seem to be. Did you have two hadoop*jar

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
It worked by renaming the hadoop-append*.jar file to hadoop-core.0.20.2.jar file..I dnt know why.. but it worked..!! Also.. After this thing.. my hbase started well for 1 time.. but after that.. its not working..fine.. there is some problem is starting of region servers.. I have send the

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Chris Smith
Elton, Rapleaf's blog has an interesting posting on their experience that's worth a read: http://blog.rapleaf.com/dev/2010/08/26/analyzing-some-interesting-networks-for-mapreduce-clusters/ And if you want to get an idea of the interaction between CPU, Disk and Network there nothing like a

standalone ? mapred.LocalJobRunner

2011-06-06 Thread Shi Yu
Hi, I am stuck in a basic problem but can't figure out. My previous verbose logging problem is the same as the one mentioned in the old post. http://mail-archives.apache.org/mod_mbox/nutch-user/200901.mbox/%3c0adbd67bd6811a4bb2144d805124714d03f754a...@kaex1.dom.rastatt.de%3E First quesiton, if

Re: Backing up namenode

2011-06-06 Thread Mark
Thank you. I added another directory the following configuration and restarted my cluster: property namedfs.name.dir/name value/var/hadoop/dfs/name,/var/hadoop/dfs/name.backup/value /property However the name.backup directory is empty. Is there anything I need to do to tell it to backup?

RE: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Michael Segel
Chris, I've gone back through the thread and here's Elton's initial question... On 06/06/11 08:22, elton sky wrote: hello everyone, As I don't have experience with big scale cluster, I cannot figure out why the inter-rack communication in a mapreduce job is significantly slower

Re: DistributedCache

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu sh...@uchicago.edu wrote: I still don't understand, in a cluster you have a shared directory to all the nodes, right? Just put the configuration file in that directory and load it in all the mappers, isn't that simple? So I still don't understand

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread elton sky
Michael, Depending on your hardware, that's a fabric of 40GB, shared. So that fabric is shared by all 42 ports. And even if I just used 2 ports out of 42, connecting to 2 racks, if there's enough traffic coming, these 2 ports could use all 40GB. Is this right? -Elton On Tue, Jun 7, 2011 at 1:42

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Mauricio Cavallera
Unsubscribe El jun 6, 2011 10:54 a.m., Joey Echeverria j...@cloudera.com escribió: Most of the network bandwidth used during a MapReduce job should come from the shuffle/sort phase. This part doesn't use HDFS. The TaskTrackers running reduce tasks will pull intermediate results from

Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Hi, Does anyone have a way to reduce InputSplit size in general ? By default, the minimum size chunk that map input should be split into is set to 0 (ie.mapred.min.split.size). Can I change dfs.block.size or some other configuration to reduce the split size and spawn many mappers? Thanks, Mark

Hbase startup error: NoNode for /hbase/master after running out of space

2011-06-06 Thread Zhong, Sheng
Hey, Hope anyone here could provide help on the hbase issue we got after our of space. I could bring up HDFS via one of post found by manually modifying namenode's /var/hdfs/current/edits file: $ printf \xff\xff\xff\xee\xff edits. However, the issue came to hbase startup. We could see HMaster

Hadoop Cluster Multi-datacenter

2011-06-06 Thread sanjeev . taran
Hello, I wanted to know if anyone has any tips or tutorials on howto install the hadoop cluster on multiple datacenters Do you need ssh connectivity between the nodes across these data centers? Thanks in advance for any guidance you can provide.

Re: Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Great! Thanks guys :) Mark 2011/6/6 Panayotis Antonopoulos antonopoulos...@hotmail.com Hi Mark, Check: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html I think that setMaxInputSplitSize(Job job, long size)