This isn't directly related to the question, but I was reading about the newly released JDK 16 today and there is initial support for explicit vectorized operations, which might be interesting to explore for anyone considering building a Java DataFrame implementation.
https://openjdk.java.net/jeps/338 On Tue, Mar 16, 2021 at 5:43 PM Andrew Melo <[email protected]> wrote: > I can't speak to how complete it is, but I looked earlier for > something similar and ran across > https://github.com/deeplearning4j/nd4j .. it's probably not an exact > fit, but it does appear to be able to consume arrow buffers and expose > them to java. > > Cheers > Andrew > > On Tue, Mar 16, 2021 at 6:36 PM Wes McKinney <[email protected]> wrote: > > > > This has been asked several times in the past but I'm not aware of > > anything "dataframe-like" in Java that's build against Arrow (or > > otherwise) that fills the kind of need that pandas does. There was a > > Scala project some years ago Saddle [1] (not Arrow-based) built > > initially by one of the early pandas developers but I don't think it's > > still being actively developed. To build a higher-level Java API on > > top of the Arrow Java libraries would be incredibly useful to the > > community I'm sure. > > > > [1]: https://github.com/saddle/saddle > > > > On Tue, Mar 16, 2021 at 5:06 PM Paul Whalen <[email protected]> wrote: > > > > > > Hi, > > > > > > I've been using Arrow for some time now, mostly in the context of > Arrow Flight between Java and Python. While it's quite easy to convert > Arrow data in Python to a pandas dataframe and manipulate it, I'm > struggling to find an obvious analogue on the Java side. VectorSchemaRoot > is useful for loading/unloading/moving data, but clumsy for doing higher > level operations, especially joins/aggregations/etc across "tables". > > > > > > In other words, if I wanted to load non Arrow formatted data from > somewhere into Java, manipulate it with a dataframe like API, and then send > the result somewhere via Flight, what library would be the best/simplest > way to accomplish that? I see lots of progress in other languages, but I'm > wondering what would be recommended for Java. > > > > > > I'm currently looking at Spark SQL just in-application, but that seems > a touch heavyweight, and I'm not sure it would do exactly what I've > described (nor am I terribly familiar with Spark in the first place). > > > > > > If the premise of this question is flawed, please feel free to correct > me. > > > > > > Thanks! > > > Paul >
