The Hadoop shuffle phase seems painstakingly slow. For example, I am
running a very large job, and all the reducers report a status such as:
"reduce > copy (14266 of 28243 at 1.30 MB/s)"
This is after all the mappers are finished. Is it supposed to be so
slow?
The following distcp command has been working fine until today. Does anyone
have any idea on what does "global counters are inaccurate" mean?
~/projects/pig] hadoop distcp
s3n://$ACCESS_KEY_ID:$secret_access_key...@$bucket_amf/
hdfs://hadoop01.dc1.foo.com:9000/user/bar/data/amfdata
With failur
I've used YourKit Java Profiler pretty successfully. There's a
JobConf parameter you can flip on that will cause a few maps and
reduces to start with profiling on, so you won't be overwhelmed with
info.
-Bryan
On Feb 27, 2009, at 11:12 AM, Sandy wrote:
Hello,
Could anyone recommend any
On Fri, Feb 27, 2009 at 11:21 AM, Doug Cutting wrote:
> I think they're complementary.
>
> Hadoop's MapReduce lets you run computations on up to thousands of computers
> potentially processing petabytes of data. It gets data from the grid to
> your computation, reliably stores output back to the
So all you need is a grid of GPU machines (hopefully, coming up, judging by
the blogs, or just buy your own)
On Fri, Feb 27, 2009 at 1:21 PM, Doug Cutting wrote:
> I think they're complementary.
>
> Hadoop's MapReduce lets you run computations on up to thousands of
> computers potentially proces
I think they're complementary.
Hadoop's MapReduce lets you run computations on up to thousands of
computers potentially processing petabytes of data. It gets data from
the grid to your computation, reliably stores output back to the grid,
and supports grid-global computations (e.g., sorting).
Hello,
Could anyone recommend any software for profiling the performance of
MapReduce applications one may write for Hadoop? I am currently developing
in Java.
Thanks,
-SM
Dear friends,
I am new at Hadoop and I must say I just want to use it as a map & reduce
framework.
I've developed an application to be run in a server with 8 CPU and
everything seems to work properly but the performance. It doesn't use all
the CPU power.
I'm trying to process 200.000 documents a
kang_min82 wrote:
Hello Matei,
Which Tasktracker did you mean here ?
I don't understand that. In general we have mane Tasktrackers and each of
them runs on one separate Datanode. Why doesn't the JobTracker talk directly
to the Namenode for a list of Datanodes and then performs the MapReduce
t