Hey y'all, thanks in advance for the discussion. I'm creating Arrow extensions for computer vision and I'm running into issues in two scenarios. I couldn't find the answers in the archive so I thought I'd post here.
Example: I make an extension type called "Label" that has storage type "dictionary<int8, string>". This is an object detection dataset so each row represents an image and has multiple detected objects that needs to be labeled. So there's a "name" column that is "list<label>": Example table schema: *image_id: int* *uri: string* *label: list<label> # list<dictionary<int8, string>> storage type* Problems: 1. `to_numpy` does not seem to work with a nested column. e.g., if I try to call `to_numpy` on the `label` column, then I get "Not implemented type for Arrow list to pandas: extension<label<LabelType>>" 2. If I'm querying this dataset using duckdb, running "select * from dataset where label='person'" results in: "Function 'equal' has no kernel matching input types (extension<label<LabelType>>, string)" Am I missing an alternate path to make this work with extension types? Does implementing this in Arrow consist of checking if something is an extension type and if so, use the storage type instead? Is this something that's already on the roadmap at all? Thanks! Chang She
