[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46402006 @dorx you said above you are trying to use the RNG from Jet? why, if Commons math3 is usable now? I prefer the latter FWIW. I probably misunderstand your comment. --- If

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/813#discussion_r13901535 --- Diff: python/pyspark/join.py --- @@ -79,15 +79,15 @@ def dispatch(seq): return _do_python_join(rdd, other, numPartitions, dispatch)

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-46401560 Hey @douglaz, thanks for updating this. One thing missing here is tests in each of the languages -- please add them so that this code will be tested later. --- If your pro

[GitHub] spark pull request: Optimize the schedule procedure in Master

2014-06-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-46401368 Another situation is that the works lists changes frequently, which will make drivers relaunching a lot. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/813#discussion_r13901345 --- Diff: python/pyspark/join.py --- @@ -79,15 +79,15 @@ def dispatch(seq): return _do_python_join(rdd, other, numPartitions, dispatch)

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/813#discussion_r13901329 --- Diff: python/pyspark/rdd.py --- @@ -1324,11 +1324,11 @@ def mapValues(self, f): return self.map(map_values_fn, preservesPartitioning=True)

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/813#discussion_r13901312 --- Diff: python/pyspark/join.py --- @@ -79,15 +79,15 @@ def dispatch(seq): return _do_python_join(rdd, other, numPartitions, dispatch)

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-46400984 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-46400995 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-46400741 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

2014-06-17 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/807#discussion_r13900807 --- Diff: external/flume-sink/src/main/scala/org/apache/spark/flume/sink/SparkSink.scala --- @@ -0,0 +1,432 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread rezazadeh
Github user rezazadeh commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13900614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -220,16 +247,43 @@ class RowMatrix( } /*

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46399205 Yup looks like a racing condition (in a good way). Thanks a lot for catching this! --- If your project is set up for it, you can reply to this email and have your reply ap

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread gregakespret
Github user gregakespret commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46399053 @rxin Sure, PR closed. At the time I created this PR, the other one wasn't yet merged in I suppose. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread gregakespret
Github user gregakespret closed the pull request at: https://github.com/apache/spark/pull/1109 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature i

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46398850 Btw, we shouldn't use default parameters in method definition. It is convenient in Scala but it is not Java friendly. Also, this is hard for us to maintain binary compatibi

[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...

2014-06-17 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/807#issuecomment-46398801 @tdas Use the following configuration file to start flume: agent.sources = seqGenSrc agent.channels = memoryChannel agent.sinks = spark # F

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46398768 @gregakespret since this has been fixed already in master, do you mind closing this pr? --- If your project is set up for it, you can reply to this email and have your repl

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13900396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46398707 @vrilleup Thanks for updating the PR! I made a comment on the explicit type checks. I'm a little confused about the new API. If `isDenseSVD` is true, `tol` doesn't mean any

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r1396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix( } /**

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899798 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899792 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899776 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13899771 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/369#issuecomment-46396473 Took a look through and didn't see any issues with the merge. Thanks to everyone who helped me get this in! On Tue, Jun 17, 2014 at 2:48 PM, Reynold Xin

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/961 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/961#issuecomment-46394563 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/961#issuecomment-46394564 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15865/ --- If your project

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46393840 I think it's legacy reason to have two different way to access the API. As far as I know, @mengxr is working on consolidating the interface. He probably can talk about mor

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread BaiGang
Github user BaiGang commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46393686 @dbtsai Thanks for looking into this PR. I've updated the code, removing the unnecessary type specifications for the vars in class LBFGS. Also I have a question.

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/961#issuecomment-46392896 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/961#issuecomment-46392892 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [STREAMING] SPARK-2009 Key not found exception...

2014-06-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/961#issuecomment-46392831 Jenkins, retest this please. Okay I think we can merge this purely on the grounds of more defensive coding (it avoids a potential `null` value). --- If your project is s

[GitHub] spark pull request: Optimize the schedule procedure in Master

2014-06-17 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-46391510 yes, potentially... and considering the long-term running of the cluster, eventually, the load is well balanced with the current strategy.i.e. this commit o

[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-46391465 Never mind. I had to close the pull request. I thought about it. The ccAccumulator is not accessible from the vprog which was my goal. I'm going to have to use a broadca

[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska closed the pull request at: https://github.com/apache/spark/pull/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is en

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13897825 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with LocalSpark

[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
Github user tmalaska commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-46391395 Wait, this isn't going to get me what I want, because I can't read the ssAccumulator in the vprog. I think I will have to change to a boardcast. I will --- If your

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46391364 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-46391362 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46391365 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15864/ --- If your project

[GitHub] spark pull request: Spark-2173 : Add Master Computer and SuperStep...

2014-06-17 Thread tmalaska
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/ Spark-2173 : Add Master Computer and SuperStep ... Add Master Computer and SuperStep Accumulator to Pregel GraphX Implemention You can merge this pull request into a Git repository by running:

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13897737 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -38,10 +38,10 @@ import org.apache.spark.mllib.linalg.{Vectors, Vector}

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46391137 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46391138 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15863/ --- If your project

[GitHub] spark pull request: Optimize the schedule procedure in Master

2014-06-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-46390574 You mean the increased shuffles may lead to a bad performance? --- If your project is set up for it, you can reply to this email and have your reply appear on Git

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46389694 @falaki @mengxr Barring the API related issues, everything else should be in the "final" state. commons-math3 added as a Spark dependency (okay'ed by Matei). mvn clean insta

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46389597 Is that the basic strategy we are going to use with AlphaComponents -- merging new APIs at both the minor and maintenance levels? I don't know that I have any objecti

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46389585 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46389593 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46389358 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46389346 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/999 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-46389121 I add a flag for choosing dense/sparse svd, and set n = 100 as threshold for default behavior. User can make a choice by the specific application. --- If your project i

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46389105 Thanks. I'm merging this in master & branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46387957 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15862/ --- If your project

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46387956 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896269 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix( } /**

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896272 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix( } /**

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896267 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896259 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896273 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix( } /**

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896271 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -201,6 +202,31 @@ class RowMatrix( } /**

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896265 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896229 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896233 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896232 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896236 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896226 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

2014-06-17 Thread vrilleup
Github user vrilleup commented on a diff in the pull request: https://github.com/apache/spark/pull/964#discussion_r13896212 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: SPARK-1868: Users should be allowed to cogroup...

2014-06-17 Thread douglaz
Github user douglaz commented on the pull request: https://github.com/apache/spark/pull/813#issuecomment-46386598 @pwendell, merged with latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1946] Submit stage after (configured ra...

2014-06-17 Thread li-zhihui
Github user li-zhihui commented on a diff in the pull request: https://github.com/apache/spark/pull/900#discussion_r13895415 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -225,6 +232,17 @@ class CoarseGrainedSchedulerBa

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/718#issuecomment-46384763 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/718#issuecomment-46384765 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15861/ --- If your project is set up for it, you can r

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46384563 This was already fixed earlier today so it's probably hitting a merge conflict. --- If your project is set up for it, you can reply to this email and have your reply ap

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46384591 https://github.com/apache/spark/commit/b2ebf429e24566c29850c570f8d76943151ad78c --- If your project is set up for it, you can reply to this email and have your reply ap

[GitHub] spark pull request: [SPARK-1946] Submit stage after (configured ra...

2014-06-17 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/900#discussion_r13895173 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -225,6 +232,17 @@ class CoarseGrainedScheduler

[GitHub] spark pull request: Double check in doGetLocal to avoid read on re...

2014-06-17 Thread colorant
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1103#issuecomment-46383834 @andrewor14 e.g. you clean old rdd base on timestamp etc. I know this could be rare, but not entirely wrong. And, if something is really go wrong, I tend to think the ex

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46383668 Actually the merge script failed for this pull request. @pwendell any idea? ``` > ./merge_spark_pr.py Which pull request would you like to merge? (e.g. 34): 1109

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46383321 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46383326 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2170: Fix for global name 'PIPE' is not ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1109#issuecomment-46383100 Thanks. Merging this in master & branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-46382962 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15860/ --- If your project

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-46382961 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/718#issuecomment-46382733 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/718#issuecomment-46382728 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-06-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/718#issuecomment-46382432 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46381897 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15858/ --- If your project

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-46381896 Build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [WIP][SPARK-1485][MLLIB] Implement Butterfly A...

2014-06-17 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/506 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [WIP][SPARK-1485][MLLIB] Implement Butterfly A...

2014-06-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/506#issuecomment-46380865 Thanks all for reviewing this PR! I found the butterfly pattern introduces complex dependency that slows down the computation. In my tests, a good approach for Spark is tre

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/999#issuecomment-46380653 This looks to me overall. Only few nitpicks. I think we should merge it after you addressed the couple comments I had. --- If your project is set up for it, you can

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-46380509 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/999#discussion_r13893273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-46380502 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/999#discussion_r13893257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-06-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1110 [WIP][SPARK-2174][MLLIB] treeReduce and treeAggregate In `reduce` and `aggregate`, the driver node spends linear time on the number of partitions. It becomes a bottleneck when there are many partitio

[GitHub] spark pull request: [Spark 2060][SQL] Querying JSON Datasets with ...

2014-06-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/999#discussion_r13893161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -342,13 +344,34 @@ class SchemaRDD( def toJavaSchemaRDD: JavaSchemaRDD = new J

  1   2   3   >