I guess since the keys are only additive then you just create the master
dictionary before allowing random access to the data.

On Tue, Feb 22, 2022 at 11:08 AM Chris Nuernberger <[email protected]>
wrote:

> OK, thanks, I will work with delta dictionaries.
>
> How do delta dictionaries solve the random access issue?
>
> On Tue, Feb 22, 2022 at 9:51 AM Micah Kornfield <[email protected]>
> wrote:
>
>> Dictionary replacement isn't supported in the file format because the
>> metadata makes it difficult to associate a particular dictionary with a
>> record batch for Random access.
>>
>> Delta dictionaries are supported but there was a long standing bug that
>> prevented there use in Python (
>> https://issues.apache.org/jira/browse/ARROW-13467).  If you are still
>> seeing issues in pyarrow 7.0 please open a bug.
>>
>> In regards to the usefulness of the file format without these features
>> that is really use case dependent.
>>
>> Cheers,
>> Micah
>>
>> On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]>
>> wrote:
>>
>>> How are dictionaries intended to be used in a file with multiple record
>>> batches?
>>>
>>> I tried saving record-batch-specific dictionaries and got this error
>>> from python:
>>>
>>>  > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or
>>> dictionary delta in IPC file
>>>
>>> This seems to defeat the purpose of having multiple record batches in a
>>> single arrow file; the work around appears to be to either preprocess the
>>> entire sequence of datasets to unify the dictionaries or save multiple
>>> arrow files.
>>>
>>

Reply via email to