This seems like a legitimate blocker. We will cut another RC to include the
revert.
2014-11-16 17:29 GMT-08:00 Kousuke Saruta saru...@oss.nttdata.co.jp:
Now I've finished to revert for SPARK-4434 and opened PR.
(2014/11/16 17:08), Josh Rosen wrote:
-1
I found a potential regression in
Andrew,
I put up 1.1.1 branch and I am getting shuffle failures while doing flatMap
followed by groupBy...My cluster memory is less than the memory I need and
therefore flatMap does around 400 GB of shuffle...memory is around 120 GB...
14/11/13 23:10:49 WARN TaskSetManager: Lost task 22.1 in
* I moved from sbt to maven in June specifically due to Andrew Or's
describing mvn as the default build tool. Developers should keep in mind
that jenkins uses mvn so we need to run mvn before submitting PR's - even
if sbt were used for day to day dev work
To be clear, I think that the PR
The docs on using sbt are here:
https://github.com/apache/spark/blob/master/docs/building-spark.md#building-with-sbt
They'll be published with 1.2.0 presumably.
On 2014년 11월 17일 (월) at 오후 2:49 Michael Armbrust mich...@databricks.com
wrote:
* I moved from sbt to maven in June specifically due
+0 (non-binding)
Compiled Spark, recompiled and ran application with 1.1.1 RC1 with Yarn,
plain-vanilla Hadoop 2.3.0. No regressions.
However, 12% to 22% increase in run time relative to 1.0.0 release. (No
other environment or configuration changes.) Would have recommended +1
were it not
Hey Kevin,
If you are upgrading from 1.0.X to 1.1.X checkout the upgrade notes
here [1] - it could be that default changes caused a regression for
your workload. Do you still see a regression if you restore the
configuration changes?
It's great to hear specifically about issues like this, so
Hi Alessandro,
MLlib v1.1 supports variance for regression and gini impurity and entropy
for classification.
http://spark.apache.org/docs/latest/mllib-decision-tree.html
If the information gain calculation can be performed by distributed
aggregation then it might be possible to plug it into the
This is canceled in favor of RC2 with the following blockers:
https://issues.apache.org/jira/browse/SPARK-4434
https://issues.apache.org/jira/browse/SPARK-3633
The latter one involves a regression from 1.0.2 to 1.1.0, NOT from 1.1.0 to
1.1.1. For this reason, we are currently investigating this
Manish,
Thanks for pointing me to the relevant docs. It is unfortunate that
absolute error is not supported yet. I can't seem to find a Jira for it.
Now, here's the what the comments say in the current master branch:
/**
* :: Experimental ::
* A class that implements Stochastic Gradient
Hi,
I have a rdd whose key is a userId and value is (movieId, rating)...
I want to sample 80% of the (movieId,rating) that each userId has seen for
train, rest is for test...
val indexedRating = sc.textFile(...).map{x= Rating(x(0), x(1), x(2))
val keyedRatings = indexedRating.map{x =
Hi,
Matrix computation is critical for algorithm efficiency like least square,
Kalman filter and so on.
For now, the mllib module offers limited linear algebra on matrix, especially
for distributed matrix.
We have been working on establishing distributed matrix computation APIs based
on data
There's been some work at the AMPLab on a distributed matrix library on top
of Spark; see here [1]. In particular, the repo contains a couple
factorization algorithms.
[1] https://github.com/amplab/ml-matrix
Zongheng
On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi liaoy...@huawei.com wrote:
Hi,
Hi Alessandro,
I think absolute error as splitting criterion might be feasible with the
current architecture -- I think the sufficient statistics we collect
currently might be able to support this. Could you let us know scenarios
where absolute error has significantly outperformed squared error
Hey Yuxi,
We also have implemented a distributed matrix multiplication library in
PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
implemented three distributed matrix multiplication algorithms on Spark. As
we see, communication-optimal does not always means the
Hi,
I checked the work of ml-matrix. For now, it doesn’t include matrix multiply
and LU decomposition. What’s your plan? Can we contribute our work to these
parts?
Otherwise, the block number of row/column is decided manually, As we mentioned,
the CARMA method in paper is communication-optimal.
Hi Yuxi,
We are integrating the ml-matrix from the AMPlab repo into MLlib, tracked
by this JIRA: https://issues.apache.org/jira/browse/SPARK-3434
We already have matrix multiply, but are missing LU decomposition. Could
you please track that JIRA, once the initial design is in, we can sync on
how
16 matches
Mail list logo