[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-14 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > Thanks @KyleLi1985 this looks like a nice win in the end. Thanks for your investigation. @srowen @HyukjinKwon @mgaido91 Thanks for review. It is my pleasure. ---

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-14 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 Thanks @KyleLi1985 this looks like a nice win in the end. Thanks for your investigation. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-14 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4424/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4424 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4424/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-10 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 @SparkQA retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4423/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4423 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4423/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-10 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22893 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 @SparkQA test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 There's no merge conflict right now. You can just update the file and push the commit to your branch. If there were a merge conflict, you'd just rebase on apache/master, resolve the conflict, and

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 It seems the related file spark/python/pyspark/ml/clustering.py has been changed, during these days. My local latest commit stay on "bfe60fc on 30 Jul". So I need re-fork spark and open

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 Heh, as a side effect, this made the output of computeCost more accurate in one Pyspark test. It prints "2.0" rather than "2.000..." I think you can change the three instances that failed to just

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4420/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22893 **[Test build #4420 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4420/testReport)** for PR 22893 at commit

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 @AmplabJenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-08 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 I form the final test case for sparse case and dense case on realistic data to test new commit [SparkMLlibTest.txt](https://github.com/apache/spark/files/2561442/SparkMLlibTest.txt)

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-03 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-03 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > So the pull request right now doesn't reflect what you tested, but you tested the version pasted above. You're saying that the optimization just never helps the dense-dense case, and sqdist is

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-02 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 So the pull request right now doesn't reflect what you tested, but you tested the version pasted above. You're saying that the optimization just never helps the dense-dense case, and sqdist is

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-02 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > Hm, actually that's the best case. You're exercising the case where the code path you prefer is fast. And the case where the precision bound applies is exactly the case where the branch you

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > Hm, actually that's the best case. You're exercising the case where the code path you prefer is fast. And the case where the precision bound applies is exactly the case where the branch you

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > Hm, actually that's the best case. You're exercising the case where the code path you prefer is fast. And the case where the precision bound applies is exactly the case where the branch you

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 There is my test for situation sparse-sparse, dense-dense, sparse-dense case ` import org.apache.spark.{SparkConf, SparkContext} import

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 Hm, actually that's the best case. You're exercising the case where the code path you prefer is fast. And the case where the precision bound applies is exactly the case where the branch you deleted

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > I don't think BLAS matters here as these are all vector-vector operations and f2jblas is used directly (i.e. stays in the JVM). > > Are all the vectors dense? I suppose I'm still

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > then I think you have to try with native BLAS installed, otherwise the results are not valid IMHO. This part only use F2j to calculate as I said in last comment, so the performance is not

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > I don't think BLAS matters here as these are all vector-vector operations and f2jblas is used directly (i.e. stays in the JVM). > > Are all the vectors dense? I suppose I'm still

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 I don't think BLAS matters here as these are all vector-vector operations and f2jblas is used directly (i.e. stays in the JVM). Are all the vectors dense? I suppose I'm still surprised if

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > then I think you have to try with native BLAS installed, otherwise the results are not valid IMHO. Ok, For a fair result, I will try it ---

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22893 then I think you have to try with native BLAS installed, otherwise the results are not valid IMHO. --- - To unsubscribe,

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > @KyleLi1985 do you have native BLAS installed? Like code said : // For level-1 routines, we use Java implementation. ---

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22893 @KyleLi1985 do you have native BLAS installed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-10-31 Thread KyleLi1985
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 End-to-End TEST Situation: Use below code to test ` test("kmeanproblem") { val rdd = sc .textFile("/Users/liliang/Desktop/inputdata.txt") .map(f =>