Hi Joris, I think you are right.
We only use the nullability information in the consumers, because it makes a difference in performance. The nullability information in the schema is not accurate, as you have observed. However, such information is not well-used in the Java implementation (IMHO). For example, the validity buffer is allocated even if the vector is non-nullable. That said, I think it would be better to keep the nullability information in sync. So maybe we can open a JIRA to track it? Best, Liya Fan On Thu, May 6, 2021 at 3:09 PM Joris Peeters <[email protected]> wrote: > Hello Fan, > > Yes, but it seems that code path only affects the consumers, and whether > they set a value in the vector or not, see e.g. > https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57 > However, the VectorSchemaRoot's schema, defined I believe at > https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59, > does not appear to use this info, and just sets every column's nullability > to true (as per the link in my original email). > > Note that we are indeed using the ArrowVectorIterator, and it's when > iterating over the iterator and inspecting the schema of the elements > (VectorSchemaRoot) that I notice this. > Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in > `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary > encoding */ null, metadata);`. > > Cheers, > -J > > On Thu, May 6, 2021 at 5:53 AM Fan Liya <[email protected]> wrote: > >> Hi Joris, >> >> Thanks for reporting the problem. >> >> We make use of the nullable information >> in ArrowVectorIterator#initialize. (Details can be found in >> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73 >> ) >> >> Please note that the ArrowVectorIterator is our encouraged way of using >> the JDBC adapter. >> >> Best, >> Liya Fan >> >> >> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <[email protected]> >> wrote: >> >>> I would need to look further, but I thought we handled null vs not >>> null. At least I thought we had specialized conversion code to avoid >>> branches. If this isn't the case it seems reasonable to contribute a path. >>> >>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters <[email protected]> >>> wrote: >>> >>>> I'm looking to use the Java JDBC adapter for loading tables from SQL >>>> Server into Arrow record batches. >>>> >>>> At first glance the Arrow JDBC adapter seems to work well but, unless >>>> I'm mistaken, it simply makes every vector nullable, irrespective of >>>> whether the corresponding SQL column is nullable or not. >>>> >>>> I think the line >>>> >>>> final FieldType fieldType = new FieldType(true, arrowType, /* >>>> dictionary encoding */ null, metadata); >>>> >>>> in >>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158 >>>> might be the cause here. >>>> >>>> Is my interpretation correct, or am I missing a setting of sorts? If >>>> indeed correct, is there a fundamental reason the NULL-ness is not >>>> transferred, or is this something I could contribute in a PR? (which I'd be >>>> happy to) I guess it's just a matter of inspecting the result metadata. >>>> >>>> Cheers, >>>> -J >>>> >>>
