Hello Arrow folks, I've been skimming through the Arrow docs and code trying to figure out how one might model nested data structures where the nested portions themselves might be massive (i.e., larger than available memory). AFAICT, the nesting constructs in Arrow appear to assume that you can always fit an entire single record in memory. Am I right?
Regardless, is there a recommended way of handling this use case? The thing we want to be able to model is essentially (in Java): Iterator<Pair<T1, Iterator<T2>>. So each row is a single T1 value associated with an arbitrarily large list of T2 values. I could imagine perhaps flattening the hierarchy down into a schema that's essentially Pair<T1, T2>, especially if the T1 and T2 values can be optional. So say I had two rows with T1 values of A and B and T2 lists of [1, 2] and [3] respectively (i.e., [<A, [1, 2]>, <B, [3]>]), then you could just have rows and columns like this: T1 | T2 --------------- A | <null> <null> | 1 <null> | 2 B | <null> <null> | 3 And then you'd presumably need to write wrapper code on top of Arrow to marshal all of this under an appropriate set of Interfaces. Is there a good way to handle this use case in Arrow as it exists today? If not, do you have a sense for how hard would it be to add support for something like this more natively? -Tyler