If I read the Iceberg vectorized reader code right, it does not support nested types (same limitation as Spark's built-in vectorized parquet reader). Is that correct? Also does the C++ Parquet to Arrow reader have any such limitations?
On Wed, Aug 19, 2020 at 9:37 AM Jacques Nadeau <jacq...@apache.org> wrote: > I believe there is code in the iceberg project to do this in pure Java > [1]. Right now, there isn't a pure java implementation in the Arrow project. > > [1] > https://github.com/apache/iceberg/tree/master/arrow/src/main/java/org/apache/iceberg/arrow/vectorized > > On Wed, Aug 19, 2020 at 5:18 AM Chris Nuernberger <ch...@techascent.com> > wrote: > >> Also, javacpp has prepackaged C++ bindings to arrow for multiple OS's: >> >> http://bytedeco.org/javacpp-presets/arrow/apidocs/ >> >> We have had success with javacpp >> <https://github.com/techascent/tech.opencv> in the past and it is much >> better now that their preprocess is based on Clang. >> >> On Tue, Aug 18, 2020 at 4:16 PM Chris Nuernberger <ch...@techascent.com> >> wrote: >> >>> Thanks, that is helpful. >>> >>> Chris >>> >>> On Tue, Aug 18, 2020 at 10:24 AM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> >>>> Hi Chris, >>>> There is an open PR to support this through C++'s Dataset functionality >>>> [1]. There was also a prior attempt that went stale and I can't find at the >>>> moment. >>>> >>>> IIUC the main missing component at this point before the PR gets merged >>>> is integration to honor "-XX:MaxDirectMemorySize" settings. >>>> >>>> -Micah >>>> >>>> [1] https://github.com/apache/arrow/pull/7030 >>>> >>>> >>>> >>>> [1] https://github.com/apache/arrow/pull/7030 >>>> >>>> On Tue, Aug 18, 2020 at 6:48 AM Chris Nuernberger <ch...@techascent.com> >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> We were wondering what the best way to convert a parquet file to an >>>>> arrow file would be via a java pathway. I notice that the c++ layer >>>>> appears to have this conversion. >>>>> >>>>> The best hint I have see so far is this gist: >>>>> https://gist.github.com/animeshtrivedi/76de64f9dab1453958e1d4f8eca1605f >>>>> >>>>> I also found this jni pathway for ORC files: >>>>> https://github.com/apache/arrow/tree/master/cpp/src/jni >>>>> >>>>> Another thought I had was to use the JNA or JNR and bind to the C glib >>>>> pathway. >>>>> >>>>> Thanks for any help, >>>>> >>>>> Chris >>>>> >>>>