Hello guys..!!!
I am currently working on Hbase 0.90.3 and Hadoop 0.20.2
Since this hadoop version does not support rsync hdfs..
so I copied the *hadoop-core-append jar* file from *hbase/lib* folder
into*hadoop folder
* and replaced it with* hadoop-0.20.2-core.jar*
which was suggested in the
hello everyone,
As I don't have experience with big scale cluster, I cannot figure out why
the inter-rack communication in a mapreduce job is significantly slower
than intra-rack.
I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding
capacity. Connected with 48 nodes with
Hi,
Not able to see my email in the mail archive..So sending it again...!!!
Guys.. need your feedback..!!
Thanks,
Praveenesh
-- Forwarded message --
From: praveenesh kumar praveen...@gmail.com
Date: Mon, Jun 6, 2011 at 12:09 PM
Subject: Hadoop is not working after adding
On 06/06/11 08:22, elton sky wrote:
hello everyone,
As I don't have experience with big scale cluster, I cannot figure out why
the inter-rack communication in a mapreduce job is significantly slower
than intra-rack.
I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding
My Job submit code is
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/
something to run tool classes
Thanks for reply, Steve,
I totally agree benchmark is a good idea. But the problem is I don't have
switch to play with rather than a small cluster.
I am curious of this and post the question.
Can some experienced ppl can share their knowledge with us?
Cheers
On Mon, Jun 6, 2011 at 7:28 PM,
Larger Hadoop installations are space dense, 20-40 nodes per rack.
When you get to that density with multiple racks, it becomes expensive
to buy a switch with enough capacity for all of the nodes in all of
the racks. The typical solution is to install a switch per rack with
uplinks to a core
On Mon, 06 Jun 2011 09:18:45 -0400, dar...@ontrenet.com wrote:
I never understood how hadoop can throttle an inter-rack fiber switch.
Its supposed to operate on the principle of move-the-code to the data
because of the I/O cost of moving the data, right?
But what happens when a reducer on rack
Hello guys..
I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2
cluster I dnt know why its happening..onlye 1 time its running .. after
that its not..
HBASE WEB URL is showing the following exception
Why its happening...
Please help..!!
Thanks,
Praveenesh
HTTP ERROR
Hello guys..
Changing the name of the hadoop-apppend-core.jar file to
hadoop-0.20.2-core.jar did the trick..
Its working now..
But is this the right solution to this problem ??
Thanks,
Praveenesh
On Mon, Jun 6, 2011 at 2:18 PM, praveenesh kumar praveen...@gmail.comwrote:
Hi,
Not able to
I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop
trackers get bottlenecked to resolve those dependencies?
Again, this exposes the oddity of hadoop IMO, it tries to NOT
be I/O bound, but seems its very I/O bound...
sorry. not trying to shift the thread topic.
On Mon, 06 Jun
On Mon, 06 Jun 2011 09:26:11 -0400, dar...@ontrenet.com wrote:
I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop
trackers get bottlenecked to resolve those dependencies?
Again, this exposes the oddity of hadoop IMO, it tries to NOT
be I/O bound, but seems its very I/O
Yeah, that's a good point.
I wonder though, what the load on the tracker nodes (port et. al) would
be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
Seems to me that if there is that much traffic being mitigate across
racks, that the tracker node (or whatever node it is) would
Hi all,
Does anyone know how you can force a block to move from one rack to
another in Hadoop 0.20? Or if it is possible in another version?
Thanks,
George
--
On Mon, 06 Jun 2011 09:34:56 -0400, dar...@ontrenet.com wrote:
Yeah, that's a good point.
I wonder though, what the load on the tracker nodes (port et. al) would
be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
Seems to me that if there is that much traffic being mitigate
Thanks Joey,
So the b/w is throttled by the core switch when many nodes are requesting
traffic and the core switch can not keep up.
It's only happens when the cluster is busy enough.
On Mon, Jun 6, 2011 at 11:01 PM, Joey Echeverria j...@cloudera.com wrote:
Larger Hadoop installations are space
Yeah, the way you described it, maybe not. Because the hellabytes
are all coming from one rack. But in reality, wouldn't this be
more uniform because of how hadoop/hdfs work (distributed more evenly)?
And if that is true, then for all the switched packets passing through
the inter-rack switch, a
Most of the network bandwidth used during a MapReduce job should come
from the shuffle/sort phase. This part doesn't use HDFS. The
TaskTrackers running reduce tasks will pull intermediate results from
TaskTrackers running map tasks over HTTP. In most cases, it's
difficult to get rack locality
Hi John,
Because for map task, job tracker tries to assign them to local data nodes,
so there' not much n/w traffic.
Then the only potential issue will be, as you said, reducers, which copies
data from all maps.
So in other words, if the application only creates small intermediate
output, e.g.
On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com wrote:
Changing the name of the hadoop-apppend-core.jar file to
hadoop-0.20.2-core.jar did the trick..
Its working now..
But is this the right solution to this problem ??
It would seem to be. Did you have two hadoop*jar
It worked by renaming the hadoop-append*.jar file to hadoop-core.0.20.2.jar
file..I dnt know why.. but it worked..!!
Also.. After this thing.. my hbase started well for 1 time.. but after
that.. its not working..fine.. there is some problem is starting of region
servers..
I have send the
Elton,
Rapleaf's blog has an interesting posting on their experience that's
worth a read:
http://blog.rapleaf.com/dev/2010/08/26/analyzing-some-interesting-networks-for-mapreduce-clusters/
And if you want to get an idea of the interaction between CPU, Disk
and Network there nothing like a
Hi, I am stuck in a basic problem but can't figure out. My previous
verbose logging problem is the same as the one mentioned in the old post.
http://mail-archives.apache.org/mod_mbox/nutch-user/200901.mbox/%3c0adbd67bd6811a4bb2144d805124714d03f754a...@kaex1.dom.rastatt.de%3E
First quesiton, if
Thank you. I added another directory the following configuration and
restarted my cluster:
property
namedfs.name.dir/name
value/var/hadoop/dfs/name,/var/hadoop/dfs/name.backup/value
/property
However the name.backup directory is empty. Is there anything I need to
do to tell it to backup?
Chris,
I've gone back through the thread and here's Elton's initial question...
On 06/06/11 08:22, elton sky wrote:
hello everyone,
As I don't have experience with big scale cluster, I cannot figure out
why
the inter-rack communication in a mapreduce job is significantly
slower
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu sh...@uchicago.edu wrote:
I still don't understand, in a cluster you have a shared directory to
all the nodes, right? Just put the configuration file in that directory
and load it in all the mappers, isn't that simple?
So I still don't understand
Michael,
Depending on your hardware, that's a fabric of 40GB, shared.
So that fabric is shared by all 42 ports. And even if I just used 2 ports
out of 42, connecting to 2 racks, if there's enough traffic coming, these 2
ports could use all 40GB. Is this right?
-Elton
On Tue, Jun 7, 2011 at 1:42
Unsubscribe
El jun 6, 2011 10:54 a.m., Joey Echeverria j...@cloudera.com escribió:
Most of the network bandwidth used during a MapReduce job should come
from the shuffle/sort phase. This part doesn't use HDFS. The
TaskTrackers running reduce tasks will pull intermediate results from
Hi,
Does anyone have a way to reduce InputSplit size in general ?
By default, the minimum size chunk that map input should be split into is
set to 0 (ie.mapred.min.split.size). Can I change dfs.block.size or some
other configuration to reduce the split size and spawn many mappers?
Thanks,
Mark
Hey,
Hope anyone here could provide help on the hbase issue we got after our
of space. I could bring up HDFS via one of post found by manually
modifying namenode's /var/hdfs/current/edits file: $ printf
\xff\xff\xff\xee\xff edits. However, the issue came to hbase
startup. We could see HMaster
Hello,
I wanted to know if anyone has any tips or tutorials on howto install the
hadoop cluster on multiple datacenters
Do you need ssh connectivity between the nodes across these data centers?
Thanks in advance for any guidance you can provide.
Great! Thanks guys :)
Mark
2011/6/6 Panayotis Antonopoulos antonopoulos...@hotmail.com
Hi Mark,
Check:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html
I think that setMaxInputSplitSize(Job job,
long size)
32 matches
Mail list logo