Hello All, This is the core Java component of the DataSketches library that includes all the sketch algorithms in production-ready packages. These sketches can be called directly from this component or used in conjunction with the adaptor components such as Hadoop Pig, Hadoop Hive, or the aggregator adaptors built into Apache Druid.
Major new features and enhancements: - Quantile Sketches - A new Example Partitioner Tool can be used for partitioning medium sized data sets up to about 1E9 items. But the same algorithm could be used in a parallel environment for partitioning data sets many orders-of-magnitude larger. This partitioner can produce thousands of partitions with very small variation in their size. - Lots of internal cleanup and a few API improvements for consistency across the different quantile sketches, for example. These changes in the API, although relatively minor, were the reason to move to a major release. - Fixed an integer overflow bug caught by Karan Kumar (via Druid), where very large partitioning datasets using the classic quantiles DoublesSketch::getPartitionBoundaries() would fail. Thank you to the PMC members and community for taking the time to review this release! Lee