[ANNOUNCE] DataSketches Java 5.0.0 Released!

Lee Rhodes Sat, 09 Dec 2023 11:06:29 -0800

Hello All,

This is the core Java component of the DataSketches library that includes
all the sketch algorithms in production-ready packages. These sketches can
be called directly from this component or used in conjunction with the
adaptor components such as Hadoop Pig, Hadoop Hive, or the aggregator
adaptors built into Apache Druid.


Major new features and enhancements:

- Quantile Sketches
    - A new Example Partitioner Tool can be used for partitioning medium
sized data sets up to about 1E9 items. But the same algorithm could be used
in a parallel environment for partitioning data sets many
orders-of-magnitude larger.  This partitioner can produce thousands of
partitions with very small variation in their size.
    - Lots of internal cleanup and a few API improvements for consistency
across the different quantile sketches, for example. These changes in the
API, although relatively minor, were the reason to move to a major release.
    - Fixed an integer overflow bug caught by Karan Kumar (via Druid),
where very large partitioning datasets using the classic quantiles
DoublesSketch::getPartitionBoundaries() would fail.

Thank you to the PMC members and community for taking the time to review
this release!

Lee

[ANNOUNCE] DataSketches Java 5.0.0 Released!

Reply via email to