nishantmonu51 opened a new pull request #5958: Part 2 of changes for SQL 
Compatible Null Handling
URL: https://github.com/apache/incubator-druid/pull/5958
 
 
   Part 2 of changes for SQL Compatible Null Handling - 
   This PR contains changes to improve handling of null values in druid by 
treating nulls as missing values which don’t actually exist. This will make it 
more compatible to SQL standard and help with integration with other existing 
BI systems which support ODBC/JDBC. (Detailed description in proposal - 
https://github.com/druid-io/druid/issues/4349)
   
   ## Nullability for Aggregators/Metrics
   
   As per the current implementation, most aggregators are coded to replace 
null values with default e.g. sum treats them as zeroes, min treats them as 
positive infinity, etc.
   To match the new semantics and make aggregators nullable, where if an 
aggregator encounters only the null values the result will be null value 
following changes are made -
   
   * Aggrgator/BufferAggregator - Added a new method boolean isNull() which 
returns false by default. aggregators that support nullability can choose to 
override this and return true if the aggregated result is null. 
   * Added a NullableAggregator/NullableBufferAggregator which is be used as a 
decorator for existing aggregator implementations to make them nullable. This 
will return null values if all the values aggregated are null, If any of the 
aggregated value is non-null, it will have same behavior as the delegate 
aggregator. 
   * If new null behavior is enabled, aggregator factories will decorate the 
aggregators with NullableAggregator/NullableBufferAggregator. 
   * Added a new abstract class NullableAggregatorFactory which helps in 
reducing duplicate code and decide to switch between nullable and non-nullable 
aggregators. 
   * Cardinality and HyperUnique  aggregators ignore null values and only count 
non-null values, If all the encountered values are null, the result will be 0. 
This is different from current behavior where null values are also counted.
   * Count aggregator - since count aggregator is not associated with any 
column and just a count of the number of rows, it has the same behavior as 
before. 
   
   ## Math Expressions Null Handling
   * For general calculations like sum, full expression is be considered as 
null if any of the components is null. 
   * StringLiteral now supports null and and expressions containing non-quoted 
null is parsed as StringLiteral with null 
   * Specifying a default value for null is supported by the use of NVL or IF 
clause to assign default values at query time.
   
   ## Filtering on Null values
   * SelectorDimFilter currently specifies filtering on null values but the 
implementation assumes null and empty strings as equivalent. The implementation 
is changed to consider null and empty string differently. Generation of cache 
key for selectorDimFilter is also modified to support null.
   * InFilter/ExpressionFilter/LikeFilter are also modified to not treat nulls 
as empty strings. 
   
   ## Changes to Druid build-in SQL layer
   * NULL and empty strings are treated differently. 
   * IS NULL and IS NOT NULL filters against null values instead of empty 
strings. 
   * NVL, IFNULL and COALESCE work as per SQL standard. 
   * count now skips null values and works as per SQL standard. 
   
   ## Backwards Compatibility 
   * **_"druid.generic.useDefaultValueForNull"_** is added as a new config. The 
default value is set to true for backwards compatibility. 
   *  code places in query/ingestion where we converted emptyToNull or 
nullToEmpty now use NullHandlingHelper which only does the conversion if 
backwards compatibility is enabled. 
   * Storage layer changes are backwards compatible for GenericIndexed/String 
columns. 
   * For Numeric columns new V2 serde implementations are introduced which also 
supports serde of nullRowsBitmaps. Previous implementations are kept unchanged 
to read any existing segments. 
   
   ## Testing 
   * Most of the unit tests that tested nulls are modified to have two branches 
of expected results to test new/old behavior both. 
   * For CI have added new runs to test matrix to test new behavior for nulls. 
   * To run the tests with new behavior run 
**_"-Ddruid.generic.useDefaultValueForNull=false"_**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to