EMR setup for seq2sparse

2013-01-24 Thread Matti Kokkola
Hi, I'm using Mahout to vectorize and cluster data consisting of short texts. So far I have done vectorizing on a single multi-core machine and been quite happy with the results. However, now we are doing a lot of small adjustments to increase the qulity of results and thus would like to

Re: EMR setup for seq2sparse

2013-01-24 Thread Sean Owen
In my experience, using many small instances hurts since there is more data transferred (less data is local to any given computation) and the instance have lower I/O performance. On the high end, super-big instances become counter-productive because they are not as cheap on the spot market -- and