https://issues.apache.org/jira/browse/ARROW-12679
On Fri, May 7, 2021 at 8:54 AM Joris Peeters <[email protected]> wrote: > Fair enough. > I have this data moving through a few different servers and clients, in > IPC streaming format, consumed on various platforms/languages. The > nullability in the schema is often used in "language-friendly" clients, > e.g. to build a `std::vector<bool>` or `std::vector<std::optional<bool>>` > depending on whether the bit column is nullable, so preserving this > information is quite important, even if locally in Java it makes little > difference. > > I've worked around it for now by fudging the VectorSchemaRoot's schema > myself, but I'll open a JIRA to track, and I'll assign it to myself and > provide a fix. > > Cheers! > -Joris. > > > On Fri, May 7, 2021 at 3:22 AM Fan Liya <[email protected]> wrote: > >> Hi Joris, >> >> I think you are right. >> >> We only use the nullability information in the consumers, because it >> makes a difference in performance. >> >> The nullability information in the schema is not accurate, as you have >> observed. >> However, such information is not well-used in the Java implementation >> (IMHO). For example, the validity buffer is allocated even if the vector is >> non-nullable. >> >> That said, I think it would be better to keep the nullability information >> in sync. >> So maybe we can open a JIRA to track it? >> >> Best, >> Liya Fan >> >> >> On Thu, May 6, 2021 at 3:09 PM Joris Peeters <[email protected]> >> wrote: >> >>> Hello Fan, >>> >>> Yes, but it seems that code path only affects the consumers, and whether >>> they set a value in the vector or not, see e.g. >>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57 >>> However, the VectorSchemaRoot's schema, defined I believe at >>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59, >>> does not appear to use this info, and just sets every column's nullability >>> to true (as per the link in my original email). >>> >>> Note that we are indeed using the ArrowVectorIterator, and it's when >>> iterating over the iterator and inspecting the schema of the elements >>> (VectorSchemaRoot) that I notice this. >>> Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in >>> `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary >>> encoding */ null, metadata);`. >>> >>> Cheers, >>> -J >>> >>> On Thu, May 6, 2021 at 5:53 AM Fan Liya <[email protected]> wrote: >>> >>>> Hi Joris, >>>> >>>> Thanks for reporting the problem. >>>> >>>> We make use of the nullable information >>>> in ArrowVectorIterator#initialize. (Details can be found in >>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73 >>>> ) >>>> >>>> Please note that the ArrowVectorIterator is our encouraged way of >>>> using the JDBC adapter. >>>> >>>> Best, >>>> Liya Fan >>>> >>>> >>>> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <[email protected]> >>>> wrote: >>>> >>>>> I would need to look further, but I thought we handled null vs not >>>>> null. At least I thought we had specialized conversion code to avoid >>>>> branches. If this isn't the case it seems reasonable to contribute a >>>>> path. >>>>> >>>>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters < >>>>> [email protected]> wrote: >>>>> >>>>>> I'm looking to use the Java JDBC adapter for loading tables from SQL >>>>>> Server into Arrow record batches. >>>>>> >>>>>> At first glance the Arrow JDBC adapter seems to work well but, unless >>>>>> I'm mistaken, it simply makes every vector nullable, irrespective of >>>>>> whether the corresponding SQL column is nullable or not. >>>>>> >>>>>> I think the line >>>>>> >>>>>> final FieldType fieldType = new FieldType(true, arrowType, /* >>>>>> dictionary encoding */ null, metadata); >>>>>> >>>>>> in >>>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158 >>>>>> might be the cause here. >>>>>> >>>>>> Is my interpretation correct, or am I missing a setting of sorts? If >>>>>> indeed correct, is there a fundamental reason the NULL-ness is not >>>>>> transferred, or is this something I could contribute in a PR? (which I'd >>>>>> be >>>>>> happy to) I guess it's just a matter of inspecting the result metadata. >>>>>> >>>>>> Cheers, >>>>>> -J >>>>>> >>>>>
