[GitHub] spark pull request: [SPARK-10654][MLlib] Add columnSimilarities to...

2015-09-16 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/8792 [SPARK-10654][MLlib] Add columnSimilarities to IndexedRowMatrix Add columnSimilarities to IndexedRowMatrix be delegating to functionality already in RowMatrix. With a test. You can merge

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-08-31 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-136490512 Any progress on this @debasish83 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-06-05 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-109421413 @debasish83 It's not clear whether we need (3) yet, let's focus on (1) and (2) first. It's probably overkill to include all kinds of different similarity scores here

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-29 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-106980262 Hi @debasish83 thank you for this PR. As it stands, it has too many components, which it makes it hard to review individual contributions. @mengxr and I spoke about

[GitHub] spark pull request: [MLlib] [SPARK-6713] Iterators in columnSimila...

2015-04-05 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/5364 [MLlib] [SPARK-6713] Iterators in columnSimilarities for flatMap Use Iterators in columnSimilarities to allow flatMap to spill to disk. This could happen in a dense and large column - this way

[GitHub] spark pull request: [SPARK-1503][MLLIB] Initial AcceleratedGradien...

2015-03-08 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/4934#issuecomment-77801646 Thank you for this PR @staple ! @mengxr I suggested to @staple to first implement without backtracking to keep the PR as simple as possible. According to his

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-21 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/4089#issuecomment-70897755 Thanks @mengxr! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-17 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/4089 [MLlib] [SPARK-5301] Missing conversions and operations on IndexedRowMatrix and CoordinateMatrix * Transpose is missing from CoordinateMatrix (this is cheap to compute, so it should

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-13 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/3200#discussion_r20330047 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-13 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/3200#discussion_r20330090 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-13 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/3200#discussion_r20330160 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-58271836 @mengxr 1) Started using scopt, and 2) Distributed the error computation per your suggestion. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/2622#discussion_r18554272 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/2622#discussion_r18554285 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/2622#discussion_r18554296 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/2622#discussion_r18554333 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-07 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/2622#discussion_r18554322 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example

2014-10-04 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57896484 Parameters are now configurable. Added approximation error reporting. Added JIRA. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: CosineSimilarity Example

2014-10-01 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/2622 CosineSimilarity Example Provide example for `RowMatrix.columnSimilarity()` You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezazadeh/spark

[GitHub] spark pull request: CosineSimilarity Example

2014-10-01 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57549440 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-29 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57206298 Thanks for the review @mengxr ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56991070 Only the binary compatibility test is failing, which is expected. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-26 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57013563 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56894583 @mengxr Thanks for the optimizations. I merged the latest master into my branch and pushed to here. Would you like me to merge your branch into mine

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56904264 @mengxr Merged in your changes and added ability for the threshold to be larger with a warning. Tests pass. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-25 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56908521 @mengxr I also added broadcasting of p and v to further optimize space usage. Also now we're avoiding divide by zero if there is a column with zero magnitude

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-21 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-56316095 Why do you say normL1 is not implemented? I have implemented normL1 in MultivariateOnlineSummarizer, with tests. Do you want a version without absolute values? If so

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818640 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818648 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +28,12 @@ import com.github.fommil.netlib.BLAS

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818650 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818645 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -18,6 +18,7 @@ package

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818651 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818655 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818657 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818661 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818659 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,113 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818718 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.scala --- @@ -53,4 +53,14 @@ trait

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-20 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17818724 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,40 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523197 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala --- @@ -95,6 +95,33 @@ class RowMatrixSuite extends FunSuite

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523206 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523212 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523214 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523232 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523233 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523235 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -27,10 +27,13 @@ import com.github.fommil.netlib.BLAS

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-55542396 @mengxr All requested changes made. All tests are passing locally. However, I expect Jenkins to complain because of the new normL1 and normL2 methods added

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-14 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/1778#discussion_r17523330 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -390,6 +393,79 @@ class RowMatrix( new RowMatrix

[GitHub] spark pull request: [MLlib] Update SVD documentation in IndexedRow...

2014-09-14 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/2389 [MLlib] Update SVD documentation in IndexedRowMatrix Updating this to reflect the newest SVD via ARPACK You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [MLlib] Squash bug in IndexedRowMatrix

2014-08-31 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/2224 [MLlib] Squash bug in IndexedRowMatrix Kill this bug fast before it does damage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezazadeh

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-08-30 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-53975250 Style changes made. Experimental results below. We run DIMSUM daily on a production-scale ads dataset. After replacing the traditional cosine similarity

[GitHub] spark pull request: Dimension Independent Matrix Square Using MapR...

2014-08-26 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/336#issuecomment-53528432 Moved to https://github.com/apache/spark/pull/1778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Dimension Independent Matrix Square Using MapR...

2014-08-26 Thread rezazadeh
Github user rezazadeh closed the pull request at: https://github.com/apache/spark/pull/336 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: Dimension Indepen...

2014-08-07 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51436467 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: Dimension Indepen...

2014-08-06 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51403572 @mengxr Updated the PR to compute column magnitude as a method in RowMatrix so that binary compatibility shouldn't be a problem. This allowed me to use breeze too

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: Dimension Indepen...

2014-08-06 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51411810 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-05 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51158224 The binary backwards compatibility check doesn't like adding a new method to the trait MultivariateStatisticalSummary. Suggestions on binary compatibility welcome

[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-05 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-51214586 Having all-pairs similarity in spark has been requested several times. e.g. http://bit.ly/XAFGs8 , and also by @freeman-lab . This algorithm is also a part of Scalding

[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-04 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/1778 DIMSUM: Dimension Independent Matrix Square using Mapreduce # DIMSUM Compute all pairs of similar vectors using brute force approach, and also DIMSUM sampling approach. Laying down

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-07-09 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-48520841 Thanks @vrilleup ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-18 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13900614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -220,16 +247,43 @@ class RowMatrix

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-07 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-45423737 @vrilleup I think the binary compatibility issue is because of the change in method signature. Even though you have a default argument it changes the interface. Try

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-05 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13468982 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-05 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13469440 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-05 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13469691 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11406725 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11408048 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11414552 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11414639 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnyPCA.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11414786 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11414823 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11415360 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -0,0 +1,120 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11415951 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11416104 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39912507 Thanks @mengxr ! I made a pass. Other than these comments and marking the API experimental, should be all good. And of course the usual passing of tests. --- If your

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39919248 Thanks @mengxr ! LGTM with passing tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Dimension Independent Similarity Computation

2014-04-06 Thread rezazadeh
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/336 Dimension Independent Similarity Computation Provide Implementation of DIMSUM for squaring a Tall and Skinny Matrix as described in: http://arxiv.org/abs/1304.1467 Also, refactor Matrix

[GitHub] spark pull request: Principal Component Analysis

2014-03-20 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38139196 Thanks @kayousterhout - will add those in. I'm still going to use vim, although will likely use the IDE just before sending out the PR to check for style. It's pretty

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38021610 @mengxr All done. Should be ready now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738795 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/SparkPCA.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738816 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739020 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739074 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739082 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739092 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739118 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739157 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739233 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -29,6 +29,8 @@ import org.jblas.{DoubleMatrix, Singular, MatrixFunctions

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739282 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739288 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739320 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739553 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -38,18 +40,49 @@ class SVD

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739567 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739749 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -38,18 +40,49 @@ class SVD

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739829 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739893 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38026138 Thanks @rxin ! I went through everything. I will commit those style guides to memory - the automated tool is going to be so much appreciated. --- If your project is set

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38026856 Thanks @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10740266 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +177,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10740360 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +177,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

  1   2   >