I guess since the keys are only additive then you just create the master dictionary before allowing random access to the data.
On Tue, Feb 22, 2022 at 11:08 AM Chris Nuernberger <[email protected]> wrote: > OK, thanks, I will work with delta dictionaries. > > How do delta dictionaries solve the random access issue? > > On Tue, Feb 22, 2022 at 9:51 AM Micah Kornfield <[email protected]> > wrote: > >> Dictionary replacement isn't supported in the file format because the >> metadata makes it difficult to associate a particular dictionary with a >> record batch for Random access. >> >> Delta dictionaries are supported but there was a long standing bug that >> prevented there use in Python ( >> https://issues.apache.org/jira/browse/ARROW-13467). If you are still >> seeing issues in pyarrow 7.0 please open a bug. >> >> In regards to the usefulness of the file format without these features >> that is really use case dependent. >> >> Cheers, >> Micah >> >> On Tuesday, February 22, 2022, Chris Nuernberger <[email protected]> >> wrote: >> >>> How are dictionaries intended to be used in a file with multiple record >>> batches? >>> >>> I tried saving record-batch-specific dictionaries and got this error >>> from python: >>> >>> > pyarrow.lib.ArrowInvalid: Unsupported dictionary replacement or >>> dictionary delta in IPC file >>> >>> This seems to defeat the purpose of having multiple record batches in a >>> single arrow file; the work around appears to be to either preprocess the >>> entire sequence of datasets to unify the dictionaries or save multiple >>> arrow files. >>> >>
