What is the size of your vector mine is set to 20? I am seeing slow results
as well with iteration=5, # of elements 200,000,000.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/K-means-faster-on-Mahout-then-on-Spark-tp3195p15168.html
Sent from
with iteration=5, # of elements 200,000,000.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/K-means-faster-on-Mahout-then-on-Spark-tp3195p15168.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I
have next results for k-means:
Number of iterations= 10, number of elements = 1000, mahouttime= 602,
spark time = 138
Number of iterations= 40, number of elements = 1000, mahouttime= 1917,
spark time = 330
Number of
Maybe with MEMORY_ONLY, spark has to recompute the RDD several times because
they don't fit in memory. It makes things run slower.
As a general safe rule, use MEMORY_AND_DISK_SER
Guillaume Pitel - Président d'eXenSa
Prashant Sharma scrapco...@gmail.com a écrit :
I think Mahout uses
Mahout does have a kmeans which can be executed in mapreduce and iterative
modes.
Sent from my iPhone
On Mar 25, 2014, at 9:25 AM, Prashant Sharma scrapco...@gmail.com wrote:
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not
iterative.
Prashant Sharma
On
Mahout used MR and made one MR on every iteration. It worked as predicted.
My question more about why spark was so slow. I would try
MEMORY_AND_DISK_SER
2014-03-25 17:58 GMT+04:00 Suneel Marthi suneel_mar...@yahoo.com:
Mahout does have a kmeans which can be executed in mapreduce and iterative
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not
iterative.
Prashant Sharma
On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov pahomov.e...@gmail.comwrote:
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I
have next results for k-means:
Number of