@clint - Those configs are already in SQL compatible mode by default since 28. Aren't they?
To the original question, I am ok to deprecate these configs given that we have enough releases for folks to migrate over their queries. On Fri, Jan 26, 2024 at 4:55 AM Clint Wylie <cwy...@apache.org> wrote: > >Are there any performance implications in making those the defaults? > For numeric columns without null values there should be no impact. For > numeric columns with null values, I believe there is currently a very > minor performance impact for scans/aggregations since the null value > bitmap is iterated to check which rows are null. String performance > should not be changed afaik. I will try to find some time to do some > benchmarking to ensure this is true and measure the stuff to confirm > this, and look for opportunities to improve things where possible in > order to make this happen, since performance is important. Longer > term, I have a hunch that if we actually delete all of the default > value mode code there might also be some performance improvements due > to smaller function bytecode sizes and the lack of 'if sql compatible > mode do this, else do this' being littered everywhere across the > codebase. > > >Would this break compatibility for any existing native queries? > There are a handful of query shapes which might have some differences, > the most common are probably with strings since in default value mode > the empty string and null are treated interchangeably, so these kinds > of queries will need to be modified. > > There are also the possibility for some subtle differences in some > numeric expressions due to the 'pretend nulls do not exist' behavior > of the default value mode config, in default value mode you can write > arguably nonsense expressions that end up evaluating stuff like "3 + > 'hello'" and it will evaluate to 3, while in SQL compatible mode the > 'hello' will be unable to be implicitly cast to a number and become > zero, resulting in null. > > Strict boolean mode also has some subtle differences if selecting the > outputs, since it homogenizes all true/false values to 1/0 instead of > the javascript style behavior when it is off, where the truthy values > are just passed through. > > https://druid.apache.org/docs/latest/querying/math-expr#logical-operator-modes > > The other main change in behavior is due to the SQL three-value logic, > where a filter like "x != 'hello'" today will in default value mode > match rows with null values, but in SQL compatible mode will not match > nulls. These queries will need to be rewritten into the form "x != > 'hello' OR x is null", or there is also some native filters added > recently which could be used to wrap to make the SQL equivalent of "(x > = 'hello') is not true", https://github.com/apache/druid/pull/15182, > or "not(istrue(eq(x, 'hello'))". I'm actually open to the idea of > keeping the toggle config for 2vl/3vl since it would need checked in > very few places to control the behavior (basically in not filter), my > main goal here is to eventually get rid of > druid.generic.useDefaultValueForNull. > > I plan to dig into all of this in greater depth in the migration guide > mentioned in the first email to make the transition as painless as > possible before we remove it. > > On Mon, Jan 22, 2024 at 9:54 AM Xavier Léauté > <xav...@confluent.io.invalid> wrote: > > > > Two questions: > > - Are there any performance implications in making those the defaults? > > - Would this break compatibility for any existing native queries? > > > > On Wed, Jan 17, 2024 at 5:53 PM Clint Wylie <cwy...@apache.org> wrote: > > > > > Hi all, > > > > > > I wanted to discuss the deprecation and removal of the Druid configs > > > related to SQL compatibility, with the eventual goal of always running > > > in SQL compatible mode. As of Druid 28, these are the default, but in > > > the interest of dramatically reducing complexity and removing a ton of > > > code, and also cutting a lot of our CI time in half, I would > > > eventually like to remove the related configs and the code that > > > handles the now non-default behaviors completely. > > > > > > The related configs are: > > > > > > runtime.properties: > > > druid.generic.useDefaultValueForNull - (will become always false) > > > druid.expressions.useStrictBooleans - (will become always true) > > > druid.generic.useThreeValueLogicForNativeFilters - (will become always > > > true) > > > druid.generic.ignoreNullsForStringCardinality - (irrelevant if > > > druid.generic.useDefaultValueForNull=false) > > > > > > query context: > > > sqlUseBoundAndSelectors - (this is moderately related, and defaults to > > > value of druid.generic.useDefaultValueForNull, but enhancements to > > > expressions and sql planning for lookups make this totally unnecessary > > > to keep around) > > > > > > other things to dump while we are at it: > > > druid.expressions.allowNestedArrays - (will become always true) > > > > > > There might be additional configs which can also be removed, we can > > > add to this thread if we can think of them. > > > > > > I would like to get the official deprecation process started now with > > > Druid 29, and imagine actually removing them sometime towards the end > > > of the year, so maybe Druid 32 or so? > > > > > > Before completely removing them I think I would like to get a more in > > > depth migration guide in place, to help any hold-outs that are > > > overriding the now default SQL compatible configs so that things still > > > run in the legacy mode. > > > > > > Thoughts? Concerns? > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > > For additional commands, e-mail: dev-h...@druid.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > For additional commands, e-mail: dev-h...@druid.apache.org > >