I have the same question. Trying to figure out how to get ALS to complete with larger dataset. It seems to get stuck on "Count" from what I can tell. I'm running 8 r4.4xlarge instances on Amazon EMR. The dataset is 80 GB (just to give some idea of size). I assumed Spark could handle this, but maybe I need to try some different settings like userBlock or itemBlock. Any help appreciated!
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org