Hi everyone, I implemented a version of distributed streaming quantiles for PySpark. It uses a count-min sketch approach. You can find the code here:
https://github.com/laserson/dsq Thought it might be of interest... Uri -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com