Hey y'all, thanks in advance for the discussion.

I'm creating Arrow extensions for computer vision and I'm running into
issues in two scenarios. I couldn't find the answers in the archive so I
thought I'd post here.

Example:
I make an extension type called "Label" that has storage type
"dictionary<int8, string>". This is an object detection dataset so each row
represents an image and has multiple detected objects that needs to be
labeled. So there's a "name" column that is "list<label>":

Example table schema:
*image_id: int*
*uri: string*
*label: list<label>   # list<dictionary<int8, string>>  storage type*


Problems:
1. `to_numpy` does not seem to work with a nested column. e.g., if I try to
call `to_numpy` on the `label` column, then I get "Not implemented type for
Arrow list to pandas: extension<label<LabelType>>"
2. If I'm querying this dataset using duckdb, running "select * from
dataset where label='person'" results in: "Function 'equal' has no kernel
matching input types (extension<label<LabelType>>, string)"

Am I missing an alternate path to make this work with extension types?
Does implementing this in Arrow consist of checking if something is an
extension type and if so, use the storage type instead? Is this something
that's already on the roadmap at all?

Thanks!

Chang She

Reply via email to