Hi, Gayatri

On 03/20/2012 11:59 AM, Gayatri Rao wrote:
Hi all,

I am running a map reduce job in EC2 instances and it seems to be very
slow. It takes hours together for simple projection and aggregation of
data.
What filesystem are you using for data storage: HDFS in EC2 or Amazon S3?
Which is the data size that you are analyzing?

Upon observation, I gathered that the reduce copy speed is 0.01 MB/sec. I
am new to hadoop. Could any one please share  insights about the reduce
copy speeds
are good to work with. If any one has an experience any tips in improving
it.
Hadoop Map/Reduce jobs shuffle lots of data, so the recommended configuration is to use 10Gbps networks for
the underline connection (and dedicated switches on dual-gigabit networks)

Remember too that Hadoop is not a real-time system, if you need real-time random access to your data, use HBase
http://hbase.apache.org

Regards

Thanks
Gayatri


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to