Github user HuJiayin closed the pull request at:
https://github.com/apache/spark/pull/8546
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-141855212
On the other hand, newcenters will cause a sudden increasing of memory
usage, though call clear immediately, but i think it waits for GC to clear.
Newcenter will still
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-141347838
@HuJiayin This basically reverts the behavior back to 1.2. The changes we
made in 1.3 is to avoid recomputing distances between old centers and input
points during
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-140959166
@mengxr I reduced centers storage and deleted the fallback and duplicate
codes. I tested the functionality, performance on my local side and works.
Could you give me a
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-140283221
@HuJiayin Could you add a description to the PR title? The JIRA number
doesn't describe the content. For the implementation, maybe we can avoid
duplicating code. How
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-138047275
The user doesn't need enlarge the memory to run 1.5 kmeans after apply this
fix. They still can use 1.2 configuration and have stable run experience in the
same time.
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136908702
Kmeans|| is a better algorithms to find the centers, if user has sufficient
memory, the performance is better. But sometime because of Kmeans parameters
like K,
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136911191
Users can manually adjust memory when they meet failure but they may cost
some time to find the root cause. There are many ways to implement kmeans and
they work well
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136652392
I personally think it's brittle to arbitrarily change behavior based on one
executor's memory size. It seems like this suggests you can only use the
current
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/8546#discussion_r38401804
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -246,10 +248,23 @@ class KMeans private (
if
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/8546#discussion_r38401811
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -246,10 +248,23 @@ class KMeans private (
if
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136566441
cc @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user HuJiayin opened a pull request:
https://github.com/apache/spark/pull/8546
SPARK-10329
Kmeans || is better to find centers more efficient based on stochastic
processes,et al. But some users with small memory will meet difficulties to run
this. The patch will fallback to
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136561196
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
14 matches
Mail list logo