GitHub user thisisnic added a comment to the discussion: [C++] Supporting compute functions on ExtensionTypes
A few notes from points raised in the Arrow dev meeting: * there is some interest in a "fast path" for implicitly casting extension types to their physical types which would be useful for things like Arrow's variant type * in principle, you can remove the metadata and treat the column as its physical type (e.g. a JSON extension type just becomes a string). This can be done in-place without copying data. However, problems arise when kernels or functions check types using the metadata and don’t recognize the mapping * in R, due to optimisation and execution order, suggestions like mutating a dataset before processing might not work as expected due to query optimisation done in R, though this could possibly be fixed at the R wrapper level if we were to build lower-level R bindings to mutate the schema metadata directly as a workaround. * the extension type feature is incomplete across the ecosystem - not just compute - so there could be some value in consolidating known issues and prioritizing C++ fixes to benefit R and Python * some cases are particularly complex and this has been seen in other Arrow-compatible ecosystems, i.e. DuckDB now preserves extension metadata when reading Arrow tables, but modifying data (e.g. appending strings) inevitably strips extension types and so recasting is needed after such transformations. GitHub link: https://github.com/apache/arrow/discussions/46671#discussioncomment-13370462 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
