Hi,

I've been using Arrow for some time now, mostly in the context of Arrow
Flight between Java and Python.  While it's quite easy to convert Arrow
data in Python to a pandas dataframe and manipulate it, I'm struggling to
find an obvious analogue on the Java side.  VectorSchemaRoot is useful for
loading/unloading/moving data, but clumsy for doing higher level
operations, especially joins/aggregations/etc across "tables".

In other words, if I wanted to load non Arrow formatted data from somewhere
into Java, manipulate it with a dataframe like API, and then send the
result somewhere via Flight, what library would be the best/simplest way to
accomplish that?  I see lots of progress in other languages, but I'm
wondering what would be recommended for Java.

I'm currently looking at Spark SQL just in-application, but that seems a
touch heavyweight, and I'm not sure it would do exactly what I've described
(nor am I terribly familiar with Spark in the first place).

If the premise of this question is flawed, please feel free to correct me.

Thanks!
Paul

Reply via email to