hey Jacques,
I think in the case of dictionary encoding, any algorithm should be
operating with the dictionary for a particular field in a record batch
already in hand. Certain algorithms optimized for dictionary-encoded
data (like hash aggregations) may have to branch at fragment merge
steps (whe
I think we need to start separating out dataset behavior from base IPC
behavior. Having worked with this kind of structure in both Drill (where
things were entirely late bound dynamic) and Dremio (where we start with
schema and restart if we identify schema change), I strongly recommend that
"datas
@Wes McKinney, I see your comments. Thank you so much.
I agree with you that the schema and dictionary should be separated.
However, according to the current Java implementation, the dictionary is
attached to the schema, so a refactoring is required.
BTW, a somewhat related problem is that the da
hi Liya,
I left a couple of comments in the document. You might look at what we
have developed in C++ and JavaSript which is more mature and widely
used in those languages than what currently exists in Java.
In particular, I strongly encourage you to avoid creating a coupling
between the Schema (
@Micah Kornfield Thanks a lot for your comments.
In the doc, we identify 3 problems for the current dictionary encoding use
case (there can be more, so please give your valuable suggestions):
1. there should be a convenient way to provide access to both
encoded/decoded data.
2. the constructor f
Hi Liya Fan,
Thanks you for doing this. I need to take a closer look at the PR in
question and the dictionary encoding code but this seems like it is on the
right track.
Could other java contributors with more familiarity in the space look over
the document to make sure it makes sense to them?
T
Hi all,
This is concerning issue ARROW-3396.
I have summarized the problem (please see if my understanding is correct),
and proposed some solutions to it. Please give your valuable feedback.
For details, please see:
https://docs.google.com/document/d/1Y2E6RbZkUj3SwuEJrlEjaeIPmCA1SIsi9wmbJmVlB2I/