Thanks guys. I tested a bit more and I will get a seg fault when I'm reading ArrayBuffer from memory in Bun. If the problem is releasing a memory then I should get a seg fault also for the first chunk or shouldn't I? Now I'll get it only if there is more than one chunk. I thought that the buffer would be one memory block but maybe it isn't.
Kimmo ke 8. maalisk. 2023 klo 21.22 Aldrin <[email protected]> kirjoitti: > oops, I did misread it. After seeing Weston's response, I realized you're > casting to a duckdb_arrow_schema, which I had read as ArrowSchema (I > thought it was a weird cast). > > You're calling `duckdb_query_arrow_schema` and `duckdb_query_arrow_array` > correctly, as it's apparently a double pointer ([1] and [2], respectively). > But, looking at how it's used in duckdb ([3]), I think you're not releasing > memory correctly; namely, you're missing calls to `release` ([4]). I would > expect it to be a memory leak rather than a segfault, so there may be some > other related issue lurking in surrounding code outside of your excerpt. Or > possibly you're getting a weird situation where `delete` is deallocating > top-level objects, and the call to `new` is allocating directly over it (or > nearby memory) and your memory accesses are spanning regions that it > shouldn't. > > Anyways, Weston's code sample is what I was thinking and it should be > straightforward to adapt it for the double pointer. If helpful, [5] seems > to be where the `*duckdb_arrow_schema` and `*duckdb_arrow_array` types are > defined. As a matter of fact, I find it suspicious that the > `duckdb_query_arrow_schema` function just dereferences to access the first > element of the struct, but I don't know enough low-level details to know if > that and not calling `release` are combining to be the root cause of your > segfaults. > > > [1]: > https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L22-L30 > [2]: > https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L32-L46 > [3]: > https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L33-L38 > [4]: > https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L37 > [5]: > https://github.com/duckdb/duckdb/blob/master/src/include/duckdb.h#L268-L273 > > Aldrin Montana > Computer Science PhD Student > UC Santa Cruz > > > On Wed, Mar 8, 2023 at 8:12 AM Weston Pace <[email protected]> wrote: > >> I agree with Aldrin. `arrow_schema` has type ArrowSchema*. You are >> passing in `&arrow_schema` which has type `ArrowSchema**` but you are >> casting it to `duckdb_arrow_schema*` so it at least compiles. You can also >> simplify things by getting rid of some of this allocation. >> >> ``` >> ArrowSchema arrow_schema; >> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema >> *)&arrow_schema); >> auto schema = arrow::ImportSchema(&arrow_schema); >> auto output_stream = arrow::io::BufferOutputStream::Create(); >> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, >> *schema); >> while (true) >> { >> ArrowArray arrow_array; >> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array >> *)&arrow_array); >> if (arrow_array.length > 0) >> { >> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch >> &)**arrow::ImportRecordBatch(&arrow_array, *schema)); >> } >> else >> { >> arrow_array.release(&arrow_array); >> break; >> } >> } >> auto buffer = (*output_stream)->Finish(); >> ``` >> >> >> On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected]> wrote: >> >>> I might be misreading, but do you need to get the address of your object >>> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers, >>> I'm not sure you need to get their address and cast that to a pointer. >>> otherwise, I don't see anything else that stands out as problematic to me. >>> >>> You could try to check the Result objects so that you get better error >>> information before the segfault occurs; as it stands now you're blindly >>> dereferencing. The other thing that might be helpful is that I think you >>> don't have to call `new` every time. In all of these instances you are >>> allocating memory for the objects, but you could just create them directly >>> and then pass references to them. For example: >>> >>> ArrowSchema arrow_schema; >>> duckdb_query_arrow_schema(arrow_result, &arrow_schema); >>> ... >>> >>> Since ArrowSchema seems to be default constructible, I believe the >>> declaration alone will do the default construction (`ArrowSchema >>> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if those >>> suggestions aren't accurate. >>> >>> Aldrin Montana >>> Computer Science PhD Student >>> UC Santa Cruz >>> >>> >>> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I noticed that my previous code worked only for one ArrowArray from >>>> DuckDB. I change a code this but this is still working only for one >>>> ArrowArray. The second “loop” causes a segmentation fault. Could someone >>>> guide me a bit? What should I change to get this work? >>>> >>>> >>>> ArrowSchema *arrow_schema = new ArrowSchema(); >>>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema *)& >>>> arrow_schema); >>>> auto schema = arrow::ImportSchema(arrow_schema); >>>> auto output_stream = arrow::io::BufferOutputStream::Create(); >>>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, * >>>> schema); >>>> while (true) { >>>> ArrowArray *arrow_array = new ArrowArray(); >>>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array *)& >>>> arrow_array); >>>> if( arrow_array->length > 0) { >>>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) **arrow:: >>>> ImportRecordBatch(arrow_array,*schema)); >>>> delete arrow_array; >>>> }else{ >>>> break; >>>> } >>>> } >>>> auto buffer = (*output_stream)->Finish(); >>>> Best regards, >>>> >>>> Kimmo >>>> >>>> -- >>>> Kimmo Linna >>>> Nihtisalontie 3 as 1 >>>> 02630 ESPOO >>>> [email protected] >>>> +358 40 590 1074 >>>> >>>> >>>> >>>>
