Detecting configuration problems

2015-09-06 Thread Madhu
I'm not sure if this has been discussed already, if so, please point me to the thread and/or related JIRA. I have been running with about 1TB volume on a 20 node D2 cluster (255 GiB/node). I have uniformly distributed data, so skew is not a problem. I found that default settings (or wrong setting

Re: Exception in saving MatrixFactorizationModel

2015-09-06 Thread Ranjana Rajendran
It looks like you hit https://issues.apache.org/jira/browse/SPARK-7837 . As I understand this occurs if there is skew in unpartitioned data. Can you try partitioning model before saving it ? On Sat, Sep 5, 2015 at 11:16 PM, Madawa Soysa wrote: > outPath is correct. In the path, there are two di

Re: Exception in saving MatrixFactorizationModel

2015-09-06 Thread Madawa Soysa
Hi, I'll try partitioning. I have another question, after creating the MatrixFactorizationModel through spark, can it be serialized as a Java object without any problem? On 6 September 2015 at 22:39, Ranjana Rajendran wrote: > It looks like you hit https://issues.apache.org/jira/browse/SPARK-7

RE: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-06 Thread Cheng, Hao
Not sure if it’s too late, but we found a critical bug at https://issues.apache.org/jira/browse/SPARK-10466 UnsafeRow ser/de will cause assert error, particularly for sort-based shuffle with data spill, this is not acceptable as it’s very common in a large table joins. From: Reynold Xin [mailto

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-06 Thread james
I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'ha