nishantmonu51 opened a new pull request #5958: Part 2 of changes for SQL Compatible Null Handling URL: https://github.com/apache/incubator-druid/pull/5958 Part 2 of changes for SQL Compatible Null Handling - This PR contains changes to improve handling of null values in druid by treating nulls as missing values which don’t actually exist. This will make it more compatible to SQL standard and help with integration with other existing BI systems which support ODBC/JDBC. (Detailed description in proposal - https://github.com/druid-io/druid/issues/4349) ## Nullability for Aggregators/Metrics As per the current implementation, most aggregators are coded to replace null values with default e.g. sum treats them as zeroes, min treats them as positive infinity, etc. To match the new semantics and make aggregators nullable, where if an aggregator encounters only the null values the result will be null value following changes are made - * Aggrgator/BufferAggregator - Added a new method boolean isNull() which returns false by default. aggregators that support nullability can choose to override this and return true if the aggregated result is null. * Added a NullableAggregator/NullableBufferAggregator which is be used as a decorator for existing aggregator implementations to make them nullable. This will return null values if all the values aggregated are null, If any of the aggregated value is non-null, it will have same behavior as the delegate aggregator. * If new null behavior is enabled, aggregator factories will decorate the aggregators with NullableAggregator/NullableBufferAggregator. * Added a new abstract class NullableAggregatorFactory which helps in reducing duplicate code and decide to switch between nullable and non-nullable aggregators. * Cardinality and HyperUnique aggregators ignore null values and only count non-null values, If all the encountered values are null, the result will be 0. This is different from current behavior where null values are also counted. * Count aggregator - since count aggregator is not associated with any column and just a count of the number of rows, it has the same behavior as before. ## Math Expressions Null Handling * For general calculations like sum, full expression is be considered as null if any of the components is null. * StringLiteral now supports null and and expressions containing non-quoted null is parsed as StringLiteral with null * Specifying a default value for null is supported by the use of NVL or IF clause to assign default values at query time. ## Filtering on Null values * SelectorDimFilter currently specifies filtering on null values but the implementation assumes null and empty strings as equivalent. The implementation is changed to consider null and empty string differently. Generation of cache key for selectorDimFilter is also modified to support null. * InFilter/ExpressionFilter/LikeFilter are also modified to not treat nulls as empty strings. ## Changes to Druid build-in SQL layer * NULL and empty strings are treated differently. * IS NULL and IS NOT NULL filters against null values instead of empty strings. * NVL, IFNULL and COALESCE work as per SQL standard. * count now skips null values and works as per SQL standard. ## Backwards Compatibility * **_"druid.generic.useDefaultValueForNull"_** is added as a new config. The default value is set to true for backwards compatibility. * code places in query/ingestion where we converted emptyToNull or nullToEmpty now use NullHandlingHelper which only does the conversion if backwards compatibility is enabled. * Storage layer changes are backwards compatible for GenericIndexed/String columns. * For Numeric columns new V2 serde implementations are introduced which also supports serde of nullRowsBitmaps. Previous implementations are kept unchanged to read any existing segments. ## Testing * Most of the unit tests that tested nulls are modified to have two branches of expected results to test new/old behavior both. * For CI have added new runs to test matrix to test new behavior for nulls. * To run the tests with new behavior run **_"-Ddruid.generic.useDefaultValueForNull=false"_**
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org