Shuffle speed?

2009-02-27 Thread Nathan Marz
The Hadoop shuffle phase seems painstakingly slow. For example, I am running a very large job, and all the reducers report a status such as: "reduce > copy (14266 of 28243 at 1.30 MB/s)" This is after all the mappers are finished. Is it supposed to be so slow?

discp problem: global counters are inaccurate

2009-02-27 Thread Steve Kuo
The following distcp command has been working fine until today. Does anyone have any idea on what does "global counters are inaccurate" mean? ~/projects/pig] hadoop distcp s3n://$ACCESS_KEY_ID:$secret_access_key...@$bucket_amf/ hdfs://hadoop01.dc1.foo.com:9000/user/bar/data/amfdata With failur

Re: Profiling Hadoop

2009-02-27 Thread Bryan Duxbury
I've used YourKit Java Profiler pretty successfully. There's a JobConf parameter you can flip on that will cause a few maps and reduces to start with profiling on, so you won't be overwhelmed with info. -Bryan On Feb 27, 2009, at 11:12 AM, Sandy wrote: Hello, Could anyone recommend any

Re: How does NVidia GPU compare to Hadoop/MapReduce

2009-02-27 Thread Dan Zinngrabe
On Fri, Feb 27, 2009 at 11:21 AM, Doug Cutting wrote: > I think they're complementary. > > Hadoop's MapReduce lets you run computations on up to thousands of computers > potentially processing petabytes of data.  It gets data from the grid to > your computation, reliably stores output back to the

Re: How does NVidia GPU compare to Hadoop/MapReduce

2009-02-27 Thread Mark Kerzner
So all you need is a grid of GPU machines (hopefully, coming up, judging by the blogs, or just buy your own) On Fri, Feb 27, 2009 at 1:21 PM, Doug Cutting wrote: > I think they're complementary. > > Hadoop's MapReduce lets you run computations on up to thousands of > computers potentially proces

Re: How does NVidia GPU compare to Hadoop/MapReduce

2009-02-27 Thread Doug Cutting
I think they're complementary. Hadoop's MapReduce lets you run computations on up to thousands of computers potentially processing petabytes of data. It gets data from the grid to your computation, reliably stores output back to the grid, and supports grid-global computations (e.g., sorting).

Profiling Hadoop

2009-02-27 Thread Sandy
Hello, Could anyone recommend any software for profiling the performance of MapReduce applications one may write for Hadoop? I am currently developing in Java. Thanks, -SM

How to improve my map & reduce application

2009-02-27 Thread Pedro Vivancos
Dear friends, I am new at Hadoop and I must say I just want to use it as a map & reduce framework. I've developed an application to be run in a server with 8 CPU and everything seems to work properly but the performance. It doesn't use all the CPU power. I'm trying to process 200.000 documents a

Re: HDFS architecture based on GFS?

2009-02-27 Thread Steve Loughran
kang_min82 wrote: Hello Matei, Which Tasktracker did you mean here ? I don't understand that. In general we have mane Tasktrackers and each of them runs on one separate Datanode. Why doesn't the JobTracker talk directly to the Namenode for a list of Datanodes and then performs the MapReduce t