oops, I did misread it. After seeing Weston's response, I realized you're casting to a duckdb_arrow_schema, which I had read as ArrowSchema (I thought it was a weird cast).
You're calling `duckdb_query_arrow_schema` and `duckdb_query_arrow_array` correctly, as it's apparently a double pointer ([1] and [2], respectively). But, looking at how it's used in duckdb ([3]), I think you're not releasing memory correctly; namely, you're missing calls to `release` ([4]). I would expect it to be a memory leak rather than a segfault, so there may be some other related issue lurking in surrounding code outside of your excerpt. Or possibly you're getting a weird situation where `delete` is deallocating top-level objects, and the call to `new` is allocating directly over it (or nearby memory) and your memory accesses are spanning regions that it shouldn't. Anyways, Weston's code sample is what I was thinking and it should be straightforward to adapt it for the double pointer. If helpful, [5] seems to be where the `*duckdb_arrow_schema` and `*duckdb_arrow_array` types are defined. As a matter of fact, I find it suspicious that the `duckdb_query_arrow_schema` function just dereferences to access the first element of the struct, but I don't know enough low-level details to know if that and not calling `release` are combining to be the root cause of your segfaults. [1]: https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L22-L30 [2]: https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L32-L46 [3]: https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L33-L38 [4]: https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L37 [5]: https://github.com/duckdb/duckdb/blob/master/src/include/duckdb.h#L268-L273 Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Mar 8, 2023 at 8:12 AM Weston Pace <[email protected]> wrote: > I agree with Aldrin. `arrow_schema` has type ArrowSchema*. You are > passing in `&arrow_schema` which has type `ArrowSchema**` but you are > casting it to `duckdb_arrow_schema*` so it at least compiles. You can also > simplify things by getting rid of some of this allocation. > > ``` > ArrowSchema arrow_schema; > duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema > *)&arrow_schema); > auto schema = arrow::ImportSchema(&arrow_schema); > auto output_stream = arrow::io::BufferOutputStream::Create(); > auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, > *schema); > while (true) > { > ArrowArray arrow_array; > duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array > *)&arrow_array); > if (arrow_array.length > 0) > { > (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch > &)**arrow::ImportRecordBatch(&arrow_array, *schema)); > } > else > { > arrow_array.release(&arrow_array); > break; > } > } > auto buffer = (*output_stream)->Finish(); > ``` > > > On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected]> wrote: > >> I might be misreading, but do you need to get the address of your object >> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers, >> I'm not sure you need to get their address and cast that to a pointer. >> otherwise, I don't see anything else that stands out as problematic to me. >> >> You could try to check the Result objects so that you get better error >> information before the segfault occurs; as it stands now you're blindly >> dereferencing. The other thing that might be helpful is that I think you >> don't have to call `new` every time. In all of these instances you are >> allocating memory for the objects, but you could just create them directly >> and then pass references to them. For example: >> >> ArrowSchema arrow_schema; >> duckdb_query_arrow_schema(arrow_result, &arrow_schema); >> ... >> >> Since ArrowSchema seems to be default constructible, I believe the >> declaration alone will do the default construction (`ArrowSchema >> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if those >> suggestions aren't accurate. >> >> Aldrin Montana >> Computer Science PhD Student >> UC Santa Cruz >> >> >> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected]> wrote: >> >>> Hi, >>> >>> I noticed that my previous code worked only for one ArrowArray from >>> DuckDB. I change a code this but this is still working only for one >>> ArrowArray. The second “loop” causes a segmentation fault. Could someone >>> guide me a bit? What should I change to get this work? >>> >>> >>> ArrowSchema *arrow_schema = new ArrowSchema(); >>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema *)& >>> arrow_schema); >>> auto schema = arrow::ImportSchema(arrow_schema); >>> auto output_stream = arrow::io::BufferOutputStream::Create(); >>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, *schema >>> ); >>> while (true) { >>> ArrowArray *arrow_array = new ArrowArray(); >>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array *)& >>> arrow_array); >>> if( arrow_array->length > 0) { >>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) **arrow:: >>> ImportRecordBatch(arrow_array,*schema)); >>> delete arrow_array; >>> }else{ >>> break; >>> } >>> } >>> auto buffer = (*output_stream)->Finish(); >>> Best regards, >>> >>> Kimmo >>> >>> -- >>> Kimmo Linna >>> Nihtisalontie 3 as 1 >>> 02630 ESPOO >>> [email protected] >>> +358 40 590 1074 >>> >>> >>> >>>
