Hi, Gayatri
On 03/20/2012 11:59 AM, Gayatri Rao wrote:
Hi all,
I am running a map reduce job in EC2 instances and it seems to be very
slow. It takes hours together for simple projection and aggregation of
data.
What filesystem are you using for data storage: HDFS in EC2 or Amazon S3?
Which is the data size that you are analyzing?
Upon observation, I gathered that the reduce copy speed is 0.01 MB/sec. I
am new to hadoop. Could any one please share insights about the reduce
copy speeds
are good to work with. If any one has an experience any tips in improving
it.
Hadoop Map/Reduce jobs shuffle lots of data, so the recommended
configuration is to use 10Gbps networks for
the underline connection (and dedicated switches on dual-gigabit networks)
Remember too that Hadoop is not a real-time system, if you need
real-time random access to your data, use HBase
http://hbase.apache.org
Regards
Thanks
Gayatri
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
--
Marcos Luis OrtÃz Valmaseda (@marcosluis2186)
Data Engineer at UCI
http://marcosluis2186.posterous.com
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci