Re: [C++][C Data][Table]ChunkedArray and Table

Kimmo Linna Wed, 08 Mar 2023 13:06:51 -0800

Thanks guys. I tested a bit more and I will get a seg fault when I'm
reading ArrayBuffer from memory in Bun. If the problem is releasing a
memory then I should get a seg fault also for the first chunk or shouldn't
I? Now I'll get it only if there is more than one chunk. I thought that the
buffer would be one memory block but maybe it isn't.


Kimmo

ke 8. maalisk. 2023 klo 21.22 Aldrin <[email protected]> kirjoitti:

> oops, I did misread it. After seeing Weston's response, I realized you're
> casting to a duckdb_arrow_schema, which I had read as ArrowSchema (I
> thought it was a weird cast).
>
> You're calling `duckdb_query_arrow_schema` and `duckdb_query_arrow_array`
> correctly, as it's apparently a double pointer ([1] and [2], respectively).
> But, looking at how it's used in duckdb ([3]), I think you're not releasing
> memory correctly; namely, you're missing calls to `release` ([4]). I would
> expect it to be a memory leak rather than a segfault, so there may be some
> other related issue lurking in surrounding code outside of your excerpt. Or
> possibly you're getting a weird situation where `delete` is deallocating
> top-level objects, and the call to `new` is allocating directly over it (or
> nearby memory) and your memory accesses are spanning regions that it
> shouldn't.
>
> Anyways, Weston's code sample is what I was thinking and it should be
> straightforward to adapt it for the double pointer. If helpful, [5] seems
> to be where the `*duckdb_arrow_schema` and `*duckdb_arrow_array` types are
> defined. As a matter of fact, I find it suspicious that the
> `duckdb_query_arrow_schema` function just dereferences to access the first
> element of the struct, but I don't know enough low-level details to know if
> that and not calling `release` are combining to be the root cause of your
> segfaults.
>
>
> [1]:
> https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L22-L30
> [2]:
> https://github.com/duckdb/duckdb/blob/master/src/main/capi/arrow-c.cpp#L32-L46
> [3]:
> https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L33-L38
> [4]:
> https://github.com/duckdb/duckdb/blob/master/test/api/capi/test_capi_arrow.cpp#L37
> [5]:
> https://github.com/duckdb/duckdb/blob/master/src/include/duckdb.h#L268-L273
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Wed, Mar 8, 2023 at 8:12 AM Weston Pace <[email protected]> wrote:
>
>> I agree with Aldrin.  `arrow_schema` has type ArrowSchema*. You are
>> passing in `&arrow_schema` which has type `ArrowSchema**` but you are
>> casting it to `duckdb_arrow_schema*` so it at least compiles.  You can also
>> simplify things by getting rid of some of this allocation.
>>
>> ```
>>     ArrowSchema arrow_schema;
>>     duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema
>> *)&arrow_schema);
>>     auto schema = arrow::ImportSchema(&arrow_schema);
>>     auto output_stream = arrow::io::BufferOutputStream::Create();
>>     auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream,
>> *schema);
>>     while (true)
>>     {
>>       ArrowArray arrow_array;
>>       duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array
>> *)&arrow_array);
>>       if (arrow_array.length > 0)
>>       {
>>         (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch
>> &)**arrow::ImportRecordBatch(&arrow_array, *schema));
>>       }
>>       else
>>       {
>>         arrow_array.release(&arrow_array);
>>         break;
>>       }
>>     }
>>     auto buffer = (*output_stream)->Finish();
>> ```
>>
>>
>> On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected]> wrote:
>>
>>> I might be misreading, but do you need to get the address of your object
>>> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers,
>>> I'm not sure you need to get their address and cast that to a pointer.
>>> otherwise, I don't see anything else that stands out as problematic to me.
>>>
>>> You could try to check the Result objects so that you get better error
>>> information before the segfault occurs; as it stands now you're blindly
>>> dereferencing. The other thing that might be helpful is that I think you
>>> don't have to call `new` every time. In all of these instances you are
>>> allocating memory for the objects, but you could just create them directly
>>> and then pass references to them. For example:
>>>
>>>    ArrowSchema arrow_schema;
>>>    duckdb_query_arrow_schema(arrow_result, &arrow_schema);
>>>    ...
>>>
>>> Since ArrowSchema seems to be default constructible, I believe the
>>> declaration alone will do the default construction (`ArrowSchema
>>> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if those
>>> suggestions aren't accurate.
>>>
>>> Aldrin Montana
>>> Computer Science PhD Student
>>> UC Santa Cruz
>>>
>>>
>>> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I noticed that my previous code worked only for one ArrowArray from
>>>> DuckDB. I change a code this but this is still working only for one
>>>> ArrowArray. The second “loop” causes a segmentation fault. Could someone
>>>> guide me a bit? What should I change to get this work?
>>>>
>>>>
>>>> ArrowSchema *arrow_schema = new ArrowSchema();
>>>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema *)&
>>>> arrow_schema);
>>>> auto schema = arrow::ImportSchema(arrow_schema);
>>>> auto output_stream = arrow::io::BufferOutputStream::Create();
>>>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, *
>>>> schema);
>>>> while (true) {
>>>> ArrowArray *arrow_array = new ArrowArray();
>>>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array *)&
>>>> arrow_array);
>>>> if( arrow_array->length > 0) {
>>>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) **arrow::
>>>> ImportRecordBatch(arrow_array,*schema));
>>>> delete arrow_array;
>>>> }else{
>>>> break;
>>>> }
>>>> }
>>>> auto buffer = (*output_stream)->Finish();
>>>> Best regards,
>>>>
>>>> Kimmo
>>>>
>>>> --
>>>> Kimmo Linna
>>>> Nihtisalontie 3 as 1
>>>> 02630  ESPOO
>>>> [email protected]
>>>> +358 40 590 1074
>>>>
>>>>
>>>>
>>>>

Re: [C++][C Data][Table]ChunkedArray and Table

Reply via email to