Makes sense, I buy that :-). Thanks. On Sun, Jul 26, 2020 at 10:38 AM Jacques Nadeau <[email protected]> wrote:
> I think your first question is: can I skip the validity buffer if I know > all values are defined. > > In the Java library, you cannot. This was a design choice to simplify > implementations. The memory consumption difference is relatively small and > collapsing the concepts was done to clean up code. > > Fun fact: This was done in the second design iteration of the Java library > (the first one included support for this). We identified that many sources > of data are actually all annotated as nullable but are mostly or are all > non-null. Part of this is user laziness, part due to tools since they > frequently don't support generating both types of data (writers of Parquet > frequently do this, for example). As such, we found that wordwise > operations against validity vectors that adapt processing code based on > continuous sequences of nullable and non-nullable values was actually > substantially more beneficial to generalized real-world workloads (while > also simplifying the codebase). > > On Sun, Jul 26, 2020 at 7:00 AM Chris Nuernberger <[email protected]> > wrote: > >> Hi, I have a question about the actual file format and how it is >> reflected in the Java api. >> >> 1. Are validity masks necessary of nullable is false? >> 2. Does the java system reflect the implications of #1? Can I create a >> vector with a null validity mask? >> >> Thanks again (and again and again) for you help :-). >> >> Chris >> >
