It isn't possible with the current API, but all of the library machinery exists for you to be able to obtain this without extraordinary pain (speaking as one of the people who participated in the direct-read/write of arrow::DictionaryArray implementation). You would need to do some work on the C++ library to externalize just the dictionary data page.
On Thu, Jun 3, 2021 at 2:55 PM Juan Galvez <[email protected]> wrote: > > Hello, > > I have a large parquet file written by pandas with categorical columns (which > are read into Arrow as DictionaryArray). I want to get the value of the > categories in Python (called "dictionary" values in Arrow) without having to > read any other data from the file into memory other than metadata. Is this > possible? > > Thank you, > -Juan >
