Re: statistics null count in nested types

2021-07-16 Thread Micah Kornfield
I agree this is non-intuitive based on field names but seems consistent with the text noted below (15 values are present and only 11 are written). It seems another way of defining the value for this field would be number of definition levels written that aren't less than the max definition level?

Re: statistics null count in nested types

2021-07-16 Thread Gabor Szadovszky
Hi Jorge, Spark (similarly to other jvm based implementations) are most probably using parquet-mr. parquet-mr counts null values independently from the level in the structure. An additional twist here is we cannot store empty lists but null lists (when the list itself is null) if it is optional. T