It isn't possible with the current API, but all of the library
machinery exists for you to be able to obtain this without
extraordinary pain (speaking as one of the people who participated in
the direct-read/write of arrow::DictionaryArray implementation). You
would need to do some work on the C++ library to externalize just the
dictionary data page.

On Thu, Jun 3, 2021 at 2:55 PM Juan Galvez <[email protected]> wrote:
>
> Hello,
>
> I have a large parquet file written by pandas with categorical columns (which 
> are read into Arrow as DictionaryArray). I want to get the value of the 
> categories in Python (called "dictionary" values in Arrow) without having to 
> read any other data from the file into memory other than metadata. Is this 
> possible?
>
> Thank you,
> -Juan
>

Reply via email to