GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/645
SPARK-1129: use a predefined seed when seed is zero in XORShiftRandom
If the seed is zero, XORShift generates all zeros, which would create
unexpected result.
JIRA: https://spark
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35918773
@fommil Either AL2 or MPL should work. We only need appropriate labeling
for MPL, which is trivial. And thanks for the suggestion of making native
libraries
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/633#issuecomment-35915806
@dbtsai Since regVal remains 0.0 for any existing updater in MLlib, it
would make more sense if this change comes with the L-BFGS PR you are working
on.
---
If
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/564#discussion_r9983939
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/635#issuecomment-35849312
@markhamstra @pwendell For the use cases, this allCollect operation may be
useful in the grid search for a good set of training parameters for machine
learning
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/631
SPARK-1117: update accumulator docs
The current doc hints spark doesn't support accumulators of type `Long`,
which is wrong.
JIRA: https://spark-project.atlassian.net/browse/
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/629#issuecomment-35750636
LGTM if Travis passes (no reason not). Thanks for the fix!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/564#discussion_r9899395
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35557645
@fommil Thanks a lot! The license JIRA is also interesting to follow ~ :)
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/619#issuecomment-35535759
DOI links are "permanent" so we don't need to worry about the link becoming
invalid again. People will do a search and find the pdf easily if t
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/619#issuecomment-35470291
Do you mind using the DOI link of the paper:
http://dx.doi.org/10.1109/ICDM.2008.22 ?
---
If your project is set up for it, you can reply to this email and have
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/617#issuecomment-35459427
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/617
check key name and identity file before launch a cluster
I launched an EC2 cluster without providing a key name and an identity
file. The error showed up after two minutes. It would be good
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35449886
@fommil @MLnick I included MTJ into the benchmarks (see the updated comment
above). Basically it performs very similar to breeze.
@martinjaggi Gradient
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35442089
@fommil I have the native vecLib BLAS/LAPACK shipped with Mac OS X and
OpenBLAS installed for testing. OpenBLAS is not on the search path. I deleted
both and re
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35433131
Thanks all for the suggestions!
@srowen @giyengar I updated the small benchmark suite to include
commons-math3. It seems to me commons-math3 has couple
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/572#discussion_r9800999
I'm not sure which style to use. @rxin ? I prefer the following:
~~~
map { fold => ( // "((&
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35132098
@fommil Yes, I mentioned the benchmark suite from Peter to @srowen in my
previous comment, but it is designed for dense linear algebra. I put some of
the code I
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35038739
@fommil I don't quite understand what "roll their own" means exactly here.
I didn't propose to re-implement one or half linear algebra library
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35030774
@fommil MTJ use LGPL. See http://www.apache.org/legal/resolved.html
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35018212
@MLnick MTJ is not an option because of its license.
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35017848
@shivaram @srowen @giyengar Thanks for keeping the discussion running!
@shivaram The requirement is to add sparse data support in all existing
MLlib
Github user mengxr closed the pull request at:
https://github.com/apache/incubator-spark/pull/591
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/591
SPARK-1076: [Fix #578] add @transient to some vals
I'll try to be more careful next time.
You can merge this pull request into a Git repository by running:
$ git pull
Github user mengxr closed the pull request at:
https://github.com/apache/incubator-spark/pull/589
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/589#issuecomment-34906057
I will make another PR for the second commit. Next time we should leave the
PR open for a day or half before merge.
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/572#issuecomment-34901227
@holdenk How about splitting this PR into two? One contains the k-fold
splitting method in mllib and the fix to BernoulliSampler, and the other
contains the
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/589
SPARK-1076: Convert Int to Long to avoid overflow
Patch for PR #578.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator
Github user mengxr closed the pull request at:
https://github.com/apache/incubator-spark/pull/578
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/578#issuecomment-34845415
The link is at the bottom of the PR description.
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/578#issuecomment-34842407
@rxin Thanks! Please see the updated code.
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-34714242
@srowen Thanks for the information! I believe native BLAS/LAPACK libraries
performs much better than Java implementation for level 2 and level 3
operations, but
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-34712992
@debasish83 Are you speaking of the benchmark I posted to the JIRA?
BLAS/LAPACK cannot be used for dense vector + sparse vector. Those are designed
for dense
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-34707127
@sscdotopen @debasish83 , I'm okay with copying VectorWritable and remove
mahout-core from dependencies.
@srowen Just as you mentioned, the sparse v
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/578
Adding assignRanks and assignUniqueIds to RDD
Assign ranks to an ordered or unordered data set is a common operation.
This could be done by first counting records in each partition and then
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/572#issuecomment-34668194
@holdenk , the PartitionwiseSampledRDD was designed with this use case in
mind. Both the folded RDD and its complement can be represented by
GitHub user mengxr opened a pull request:
https://github.com/apache/incubator-spark/pull/575
[Proposal] Adding sparse data support and update KMeans
This is a proposal for sparse data support in mllib
(https://spark-project.atlassian.net/browse/MLLIB-18).
The idea of the
Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/500#issuecomment-34591039
LGTM and thanks for fixing some existing errors!
38 matches
Mail list logo