[ 
https://issues.apache.org/jira/browse/ARROW-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408456#comment-17408456
 ] 

Clark Zinzow edited comment on ARROW-5890 at 9/2/21, 1:08 AM:
--------------------------------------------------------------

[~apitrou] I'm working on a tensor column extension type similar to [this 
one|https://github.com/CODAIT/text-extensions-for-pandas/blob/dc03278689fe1c5f131573658ae19815ba25f33e/text_extensions_for_pandas/array/arrow_conversion.py]
 and was hoping to allow users to interpret Parquet columns containing bytes 
blobs (e.g. images) as tensors by having them provide a schema for those 
columns, where the column's dtype is a tensor array extension type instantiated 
with the requisite data (shape, dtype, etc.) to cast that column as a tensor 
array. Since there isn't a static conversion between the bytes blobs and the 
underlying extension array dtype (both the shape and the underlying element 
dtype is parameterizable), it'd be nice if an extension type could register a 
cast function so we could use the shape and dtype context to properly interpret 
those bytes blobs.


was (Author: clarkzinzow):
[~apitrou] I'm working on a tensor column extension type similar to [this 
one|https://github.com/CODAIT/text-extensions-for-pandas/blob/dc03278689fe1c5f131573658ae19815ba25f33e/text_extensions_for_pandas/array/arrow_conversion.py]
 and was hoping to allow users to interpret Parquet columns containing bytes 
blobs (e.g. images) as tensors by having them provide a schema for those 
columns containing a tensor array extension type instantiated with the 
requisite data (shape, dtype, etc.) to cast that column as a tensor array. 
Since there isn't a static conversion between the bytes blobs and the 
underlying extension array dtype (both the shape and the underlying element 
dtype is parameterizable), it'd be nice if an extension type could register a 
cast function so we could use the shape and dtype context to properly interpret 
those bytes blobs.

> [C++][Python] Support ExtensionType arrays in more kernels
> ----------------------------------------------------------
>
>                 Key: ARROW-5890
>                 URL: https://issues.apache.org/jira/browse/ARROW-5890
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> From a quick test (through Python), it seems that {{slice}} and {{take}} 
> work, but the following not:
> - {{cast}}: it could rely on the casting rules for the storage type. Or do we 
> want that you explicitly have to take the storage array before casting?
> - {{dictionary_encode}} / {{unique}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to