Liya Fan created ARROW-5862: ------------------------------- Summary: [Java] Provide dictionary builder Key: ARROW-5862 URL: https://issues.apache.org/jira/browse/ARROW-5862 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Liya Fan Assignee: Liya Fan
The dictionary builder servers for the following scenario which is frequently encountered in practice when dictionary encoding is involved: the dictionary values are not known a priori, so they are determined dynamically, as new data arrive continually. In particular, when a new value arrives, it is tested to check if it is already in the dictionary. If so, it is simply neglected, otherwise, it is added to the dictionary. When all values have been evaluated, the dictionary can be considered complete. So encoding can start afterward. The code snippet using a dictionary builder should be like this: {{DictonaryBuilder<IntVector> dictionaryBuilder = ...}} {{dictionaryBuilder.startBuild();}} {{...}} {{dictionaryBuild.addValue(newValue);}} {{...}} {{dictionaryBuilder.endBuild();}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)