I'm not sure if this has been discussed already, if so, please point me to
the thread and/or related JIRA.
I have been running with about 1TB volume on a 20 node D2 cluster (255
GiB/node).
I have uniformly distributed data, so skew is not a problem.
I found that default settings (or wrong setting
It looks like you hit https://issues.apache.org/jira/browse/SPARK-7837 .
As I understand this occurs if there is skew in unpartitioned data.
Can you try partitioning model before saving it ?
On Sat, Sep 5, 2015 at 11:16 PM, Madawa Soysa
wrote:
> outPath is correct. In the path, there are two di
Hi,
I'll try partitioning.
I have another question, after creating the MatrixFactorizationModel
through spark, can it be serialized as a Java object without any problem?
On 6 September 2015 at 22:39, Ranjana Rajendran wrote:
> It looks like you hit https://issues.apache.org/jira/browse/SPARK-7
Not sure if it’s too late, but we found a critical bug at
https://issues.apache.org/jira/browse/SPARK-10466
UnsafeRow ser/de will cause assert error, particularly for sort-based shuffle
with data spill, this is not acceptable as it’s very common in a large table
joins.
From: Reynold Xin [mailto
I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and 'ha