RE: how does hadoop work?

2009-12-21 Thread Ricky Ho
To my understanding, if data resides in HDFS, then the JobTracker will make use of location information to allocate data to the TaskTracker and hence can reduce the data movement between the data source and the Mapper. Data movement between Mapper and Reducer is harder to minimize (maybe provid

Re: how does hadoop work?

2009-12-21 Thread Patrick Angeles
DS, What you say is true, but there are finer points: 1. Data transfer can begin while the mapper is working through the data. You would still bottleneck on the network if: (a) you have enough nodes and spindles such that the aggregate disk transfer speed is greater than the network c

Re: how does hadoop work?

2009-12-22 Thread Ed Kohlwey
I wouldn't say that Hadoop is fast- its usually faster than using a single machine, but there's other parallel computing paradigms that outperform it. Hadoop -is- cheap. That having been said, Hadoop's speed really comes from the fact that data is being processed in parallel, so if you have 4 mach