Kevin Yang created ARROW-18288:
----------------------------------

             Summary: [GO]: pqarrow 
(github.com/apache/arrow/go/v9/parquet/pqarrow) cannot handle arrow's 
DICTIONARY field
                 Key: ARROW-18288
                 URL: https://issues.apache.org/jira/browse/ARROW-18288
             Project: Apache Arrow
          Issue Type: Bug
          Components: Go
    Affects Versions: 10.0.0, 9.0.0
            Reporter: Kevin Yang


Hey, Arrow Go Dev:
 
I was trying to save some arrow tables out to parquet files, with the help of 
the 
"[github.com/apache/arrow/go/v9/parquet/pqarrow|http://github.com/apache/arrow/go/v9/parquet/pqarrow]";
 package. btw, it's generally a great design (of Arrow) and a great Go 
implementation. 

 
However, one issue sticks out: in my original arrow Table I have some 
DICTIONARY fields, which pqarrow does NOT currently support.
 
I would assume supporting them will be quite straightward: just "denormalize" 
the DICTIONARY value into corresponding values (string, Timestamp, etc), and 
it's up to the Parquet to do the right trick (using DICTIONARY encoding, etc). 
 
I would have done this conversion on-the-fly by myself, by converting each 
DICTIONARY field into underlying values. However, the arrow table schema is 
dynamic and outside my control, and I need to iterate through fields (maybe 
structs) to locate those) -> it would be much better if pqarrow can support 
this natively. 
 
Can anyone help? thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to