Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
merged to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or i
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67756/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67756 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67756/consoleFull)**
for PR 15450 at commit
[`f870fe9`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67756 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67756/consoleFull)**
for PR 15450 at commit
[`f870fe9`](https://github.com/apache/spark/commit/f
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67520/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67520 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67520/consoleFull)**
for PR 15450 at commit
[`79c84ad`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67520 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67520/consoleFull)**
for PR 15450 at commit
[`79c84ad`](https://github.com/apache/spark/commit/7
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
@sethah done. I also removed references to the runs parameter, which has no
effect (and was triggering deprecation warnings). I should have done that last
time.
---
If your project is set up for it
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67512/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67512 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67512/consoleFull)**
for PR 15450 at commit
[`d1004d9`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67512 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67512/consoleFull)**
for PR 15450 at commit
[`d1004d9`](https://github.com/apache/spark/commit/d
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
@sethah let me know how you feel about it at this stage
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67335 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67335/consoleFull)**
for PR 15450 at commit
[`793e4d5`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67335 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67335/consoleFull)**
for PR 15450 at commit
[`793e4d5`](https://github.com/apache/spark/commit/7
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15450
Also, if we're going to make this change, we should document in the ML
estimator that the algorithm can return fewer than `k` centers.
---
If your project is set up for it, you can reply to this ema
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67256 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67256/consoleFull)**
for PR 15450 at commit
[`ebebcb9`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67256/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67256 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67256/consoleFull)**
for PR 15450 at commit
[`ebebcb9`](https://github.com/apache/spark/commit/e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67188/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67188 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67188/consoleFull)**
for PR 15450 at commit
[`85c9857`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67188 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67188/consoleFull)**
for PR 15450 at commit
[`85c9857`](https://github.com/apache/spark/commit/8
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
_k_ is a parameter to the model building process, and I don't think it
should change based on the model that comes out. It's the requested or maximum
number of centroids, if you like. Or, weigh that
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15450
I don't feel strongly either way, but I don't like the potential of this:
scala
model.getK
scala> 3
model.clusterCenters.length
scala> 1
Should we conside
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
Heh, I believe the PC term is 'corner cases'. I agree. There's not much
point in clustering data to k centroids when there are <= k distinct points. I
think that's all the more reasons to not make th
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15450
Aren't all these cases sort of non-sensical anyway? What good is performing
clustering on a dataset where the result has (approximately) the same number of
clusters as unique data points?
Th
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
@sethah I should say I am not trying to handle cases where clusters start
separate and converge to nearly the same point. I don't that's something we
should even try to do.
To elaborate, he
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15450
The cases you enumerated are the ones I was thinking of. The changes
introduced here would alleviate those problems, I agree. What I'm wondering is
if this problem still exists in other cases. If Der
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
@sethah I agree that when there are lots of unique points (>> k) then this
is almost certain to not happen, and that's most real-world use cases, but the
question indeed is what should happen when th
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15450
@srowen I'm not against the change per se, I was just hoping to understand
how duplicate centers arise. In the case of `initRandom` sampling with
replacement makes it possible to select the same init
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67009 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)**
for PR 15450 at commit
[`ab486c1`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67009/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #67009 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)**
for PR 15450 at commit
[`ab486c1`](https://github.com/apache/spark/commit/a
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15450
@sethah I wanted to check how strongly against this kind of change you
might be, and continue to discussion here.
---
If your project is set up for it, you can reply to this email and have your
repl
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15450
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66816/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #66816 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66816/consoleFull)**
for PR 15450 at commit
[`42279b8`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15450
**[Test build #66816 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66816/consoleFull)**
for PR 15450 at commit
[`42279b8`](https://github.com/apache/spark/commit/4
43 matches
Mail list logo