Github user oliverpierson commented on the pull request:
https://github.com/apache/spark/pull/11553#issuecomment-193584221
Putting this up for review now. Tests are passing on my machine. Using
`approxQuantile` in DataFrame stats reduces amount of code required by a good
bit.
Github user oliverpierson commented on the pull request:
https://github.com/apache/spark/pull/11553#issuecomment-193073200
This is still a work in progress, just wanted to get the PR up so it's on
the radar. Still need to:
- [ ] add an external Parameter (with default value)
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11553#issuecomment-193072808
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user oliverpierson opened a pull request:
https://github.com/apache/spark/pull/11553
[SPARK-13600] [MLlib] [WIP] Incorrect number of buckets in
QuantileDiscretizer
## What changes were proposed in this pull request?
QuantileDiscretizer can return an unexpected number of