I agree this is non-intuitive based on field names but seems consistent
with the text noted below (15 values are present and only 11 are written).
It seems another way of defining the value for this field would be number
of definition levels written that aren't less than the max definition level?
Hi Jorge,
Spark (similarly to other jvm based implementations) are most probably
using parquet-mr. parquet-mr counts null values independently from the
level in the structure. An additional twist here is we cannot store empty
lists but null lists (when the list itself is null) if it is optional.
T
(Branching from the previous discussion, as Micah pointed out another
interesting aspect)
Consider the list
[[0, 1], None, [2, None, 3], [4, 5, 6], [], [7, 8, 9], None, [10]]
for the schema
optional group column1 (LIST) {
repeated group list {
optional int32 element;
}
}
When looking a