On Tue, Feb 18, 2020 at 2:01 AM Micah Kornfield wrote:
>
>
>> * evaluating an expression like SUM(ISNULL($field)) is more
>> semantically ambiguous (you have to check more things) when $field is
>> a dictionary-encoded type and the values of the dictionary could be
>> null
>
> It is this type of t
> * evaluating an expression like SUM(ISNULL($field)) is more
> semantically ambiguous (you have to check more things) when $field is
> a dictionary-encoded type and the values of the dictionary could be
> null
It is this type of thing that I'm worried about (parquet just happens to be
where I'm w
hi Micah,
It seems like the null and nested issues really come up when trying to
translate from one dictionary encoding scheme to another. That we are
able to directly write dictionary-encoded data to Parquet format is
beneficial, but it doesn't seem like we should let the constraints of
Parquet's
Hi Wes and Brian,
Thanks for the feedback. My intent in raising these issues is that they
make the spec harder to work with/implement (i.e. we have existing bugs,
etc). I'm wondering if we should take the opportunity to simplify before
things are set in stone. If we think things are already set,
On Sun, Feb 9, 2020 at 12:53 AM Micah Kornfield
wrote:
>
> I'd like to understand if any one is making use of the following features
> and if we should revisit them before 1.0.
>
> 1. Dictionaries can encode null values.
> - This become error prone for things like parquet. We seem to be
> calcula
> It seems we should potentially disallow dictionaries to contain null
values?
+1 - I've always thought it was odd you could encode null values in two
different places for dictionary encoded columns.
You could argue it's more efficient to encode the nulls in the dictionary,
but I think if we're goi
I'd like to understand if any one is making use of the following features
and if we should revisit them before 1.0.
1. Dictionaries can encode null values.
- This become error prone for things like parquet. We seem to be
calculating the definition level solely based on the null bitmap.
I might h