All these warnings come from ALS iterations, from flatMap and also from aggregate, for instance the origin of the state where the flatMap is showing these warnings (w/ Spark 1.3.0, they are also shown in Spark 1.3.1):
org.apache.spark.rdd.RDD.flatMap(RDD.scala:296) org.apache.spark.ml.recommendation.ALS$.org$apache$spark$ml$recommendation$ALS$$computeFactors(ALS.scala:1065) org.apache.spark.ml.recommendation.ALS$$anonfun$train$3.apply(ALS.scala:530) org.apache.spark.ml.recommendation.ALS$$anonfun$train$3.apply(ALS.scala:527) scala.collection.immutable.Range.foreach(Range.scala:141) org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:527) org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:203) And from the aggregate: org.apache.spark.rdd.RDD.aggregate(RDD.scala:968) org.apache.spark.ml.recommendation.ALS$.computeYtY(ALS.scala:1112) org.apache.spark.ml.recommendation.ALS$.org$apache$spark$ml$recommendation$ALS$$computeFactors(ALS.scala:1064) org.apache.spark.ml.recommendation.ALS$$anonfun$train$3.apply(ALS.scala:538) org.apache.spark.ml.recommendation.ALS$$anonfun$train$3.apply(ALS.scala:527) scala.collection.immutable.Range.foreach(Range.scala:141) org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:527) org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:203) On Thu, Apr 23, 2015 at 2:49 AM, Xiangrui Meng <men...@gmail.com> wrote: > This is the size of the serialized task closure. Is stage 246 part of > ALS iterations, or something before or after it? -Xiangrui > > On Tue, Apr 21, 2015 at 10:36 AM, Christian S. Perone > <christian.per...@gmail.com> wrote: > > Hi Sean, thanks for the answer. I tried to call repartition() on the > input > > with many different sizes and it still continues to show that warning > > message. > > > > On Tue, Apr 21, 2015 at 7:05 AM, Sean Owen <so...@cloudera.com> wrote: > >> > >> I think maybe you need more partitions in your input, which might make > >> for smaller tasks? > >> > >> On Tue, Apr 21, 2015 at 2:56 AM, Christian S. Perone > >> <christian.per...@gmail.com> wrote: > >> > I keep seeing these warnings when using trainImplicit: > >> > > >> > WARN TaskSetManager: Stage 246 contains a task of very large size (208 > >> > KB). > >> > The maximum recommended task size is 100 KB. > >> > > >> > And then the task size starts to increase. Is this a known issue ? > >> > > >> > Thanks ! > >> > > >> > -- > >> > Blog | Github | Twitter > >> > "Forgive, O Lord, my little jokes on Thee, and I'll forgive Thy great > >> > big > >> > joke on me." > > > > > > > > > > -- > > Blog | Github | Twitter > > "Forgive, O Lord, my little jokes on Thee, and I'll forgive Thy great big > > joke on me." > -- Blog <http://blog.christianperone.com> | Github <https://github.com/perone> | Twitter <https://twitter.com/tarantulae> "Forgive, O Lord, my little jokes on Thee, and I'll forgive Thy great big joke on me."