Thanks Micah. Is there a Jira or pull request I could follow for the C++ implementation for arbitrary nesting? How about maps?
On Tue, Aug 25, 2020 at 9:10 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > Also does the C++ Parquet to Arrow reader have any such limitations? > > > The C++ implementation can currently either read nested structs or nested > lists but not a combination of the two. It is actively being worked on to > be able to handle arbitrary nesting. > > On Tue, Aug 25, 2020 at 1:15 AM Anoop Johnson <anoop.k.john...@gmail.com> > wrote: > >> If I read the Iceberg vectorized reader code right, it does not support >> nested types (same limitation as Spark's built-in vectorized parquet >> reader). Is that correct? Also does the C++ Parquet to Arrow reader have >> any such limitations? >> >> On Wed, Aug 19, 2020 at 9:37 AM Jacques Nadeau <jacq...@apache.org> >> wrote: >> >>> I believe there is code in the iceberg project to do this in pure Java >>> [1]. Right now, there isn't a pure java implementation in the Arrow project. >>> >>> [1] >>> https://github.com/apache/iceberg/tree/master/arrow/src/main/java/org/apache/iceberg/arrow/vectorized >>> >>> On Wed, Aug 19, 2020 at 5:18 AM Chris Nuernberger <ch...@techascent.com> >>> wrote: >>> >>>> Also, javacpp has prepackaged C++ bindings to arrow for multiple OS's: >>>> >>>> http://bytedeco.org/javacpp-presets/arrow/apidocs/ >>>> >>>> We have had success with javacpp >>>> <https://github.com/techascent/tech.opencv> in the past and it is much >>>> better now that their preprocess is based on Clang. >>>> >>>> On Tue, Aug 18, 2020 at 4:16 PM Chris Nuernberger <ch...@techascent.com> >>>> wrote: >>>> >>>>> Thanks, that is helpful. >>>>> >>>>> Chris >>>>> >>>>> On Tue, Aug 18, 2020 at 10:24 AM Micah Kornfield < >>>>> emkornfi...@gmail.com> wrote: >>>>> >>>>>> Hi Chris, >>>>>> There is an open PR to support this through C++'s Dataset >>>>>> functionality [1]. There was also a prior attempt that went stale and I >>>>>> can't find at the moment. >>>>>> >>>>>> IIUC the main missing component at this point before the PR gets >>>>>> merged is integration to honor "-XX:MaxDirectMemorySize" settings. >>>>>> >>>>>> -Micah >>>>>> >>>>>> [1] https://github.com/apache/arrow/pull/7030 >>>>>> >>>>>> >>>>>> >>>>>> [1] https://github.com/apache/arrow/pull/7030 >>>>>> >>>>>> On Tue, Aug 18, 2020 at 6:48 AM Chris Nuernberger < >>>>>> ch...@techascent.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> We were wondering what the best way to convert a parquet file to an >>>>>>> arrow file would be via a java pathway. I notice that the c++ layer >>>>>>> appears to have this conversion. >>>>>>> >>>>>>> The best hint I have see so far is this gist: >>>>>>> >>>>>>> https://gist.github.com/animeshtrivedi/76de64f9dab1453958e1d4f8eca1605f >>>>>>> >>>>>>> I also found this jni pathway for ORC files: >>>>>>> https://github.com/apache/arrow/tree/master/cpp/src/jni >>>>>>> >>>>>>> Another thought I had was to use the JNA or JNR and bind to the C >>>>>>> glib pathway. >>>>>>> >>>>>>> Thanks for any help, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>