GitHub user KyleLi1985 opened a pull request: https://github.com/apache/spark/pull/22893
One part of Spark MLlib Kmean Logic Performance problem [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic Performance problem ## What changes were proposed in this pull request? Reduce low performance logic in function fastSquaredDistance ## How was this patch tested? ./dev/run-tests pass Calculate two vector for test fastSquaredDistance 100000000 times 1 2 3 4 3 4 5 6 7 8 9 0 1 3 4 6 7 4 2 2 5 7 8 9 3 2 3 5 7 8 9 3 3 2 1 1 2 2 9 3 3 4 5 4 5 2 1 5 6 3 2 1 3 4 6 7 8 9 0 3 2 1 2 3 4 5 6 7 8 5 3 2 1 4 5 6 7 8 4 3 2 4 6 7 8 9 After added patch, the cost time update from 8395 to 5448 milliseconds You can merge this pull request into a Git repository by running: $ git pull https://github.com/KyleLi1985/spark updatekmeanpatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22893.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22893 ---- commit 701223b39a0c473de865de30b0017af4883441f3 Author: æ亮 <liang.li.work@...> Date: 2018-10-30T11:03:02Z upgrade kmean performance ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org