Hello,
It seems that the ORC reader/writer support for attributes (in Arrow it is 
called metadata) is limited. The writer does not handle at all the writing of 
Arrow metadata (neither for the table nor for fields), and the reader fills the 
Arrow schema's metadata with the ORC file metadata, but does nothing for the 
fields' metadata, as far as I can tell looking at the code.

Looking at ORC, it seems that what they call "attributes" serves a similar 
purpose as Arrow metadata. See 
https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69
As the "Type" object can represent both the table and a particular field, I 
think that that could serve for passing the metadata.
Is my understanding correct about the state of the ORC adapter and is there 
something that would prevent from doing that?

Regards
[https://opengraph.githubassets.com/d3ec807aa32290db4a737647fbcfced334e6375ead7cbbb1969b59be7db2cd43/apache/orc]<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · 
apache/orc<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads - 
orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc
github.com

Reply via email to