Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-29 Thread Wes McKinney
hey Jacques, I think in the case of dictionary encoding, any algorithm should be operating with the dictionary for a particular field in a record batch already in hand. Certain algorithms optimized for dictionary-encoded data (like hash aggregations) may have to branch at fragment merge steps (whe

Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-28 Thread Jacques Nadeau
I think we need to start separating out dataset behavior from base IPC behavior. Having worked with this kind of structure in both Drill (where things were entirely late bound dynamic) and Dremio (where we start with schema and restart if we identify schema change), I strongly recommend that "datas

Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-27 Thread Fan Liya
@Wes McKinney, I see your comments. Thank you so much. I agree with you that the schema and dictionary should be separated. However, according to the current Java implementation, the dictionary is attached to the schema, so a refactoring is required. BTW, a somewhat related problem is that the da

Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-27 Thread Wes McKinney
hi Liya, I left a couple of comments in the document. You might look at what we have developed in C++ and JavaSript which is more mature and widely used in those languages than what currently exists in Java. In particular, I strongly encourage you to avoid creating a coupling between the Schema (

Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-12 Thread Fan Liya
@Micah Kornfield Thanks a lot for your comments. In the doc, we identify 3 problems for the current dictionary encoding use case (there can be more, so please give your valuable suggestions): 1. there should be a convenient way to provide access to both encoded/decoded data. 2. the constructor f

Re: [Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-12 Thread Micah Kornfield
Hi Liya Fan, Thanks you for doing this. I need to take a closer look at the PR in question and the dictionary encoding code but this seems like it is on the right track. Could other java contributors with more familiarity in the space look over the document to make sure it makes sense to them? T

[Discuss][Java][Typical use cases for dictionary encoding string vectors]

2019-06-10 Thread Fan Liya
Hi all, This is concerning issue ARROW-3396. I have summarized the problem (please see if my understanding is correct), and proposed some solutions to it. Please give your valuable feedback. For details, please see: https://docs.google.com/document/d/1Y2E6RbZkUj3SwuEJrlEjaeIPmCA1SIsi9wmbJmVlB2I/