GitHub user thisisnic added a comment to the discussion: [C++] Supporting 
compute functions on ExtensionTypes

A few notes from points raised in the Arrow dev meeting:
* there is some interest in a "fast path" for implicitly casting extension 
types to their physical types which would be useful for things like Arrow's 
variant type
* in principle, you can remove the metadata and treat the column as its 
physical type (e.g. a JSON extension type just becomes a string). This can be 
done in-place without copying data. However, problems arise when kernels or 
functions check types using the metadata and don’t recognize the mapping
* in R, due to optimisation and execution order, suggestions like mutating a 
dataset before processing might not work as expected due to query optimisation 
done in R, though this could possibly be fixed at the R wrapper level if we 
were to build lower-level R bindings to mutate the schema metadata directly as 
a workaround.
* the extension type feature is incomplete across the ecosystem - not just 
compute - so there could be some value in consolidating known issues and 
prioritizing C++ fixes to benefit R and Python
* some cases are particularly complex and this has been seen in other 
Arrow-compatible ecosystems, i.e. DuckDB now preserves extension metadata when 
reading Arrow tables, but modifying data (e.g. appending strings) inevitably 
strips extension types and so recasting is needed after such transformations.


GitHub link: 
https://github.com/apache/arrow/discussions/46671#discussioncomment-13370462

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to