Re: [C++][C Data][Table]ChunkedArray and Table

Weston Pace Thu, 09 Mar 2023 07:06:54 -0800

> I think you're not releasing memory correctly; namely, you're missing
calls to `release`


If you call `arrow::ImportRecordBatch` it will take ownership of calling
release on the ArrowArray and so you should not call release yourself.
There is a variant of this method that receives ArrowSchema instead of
std::shared_ptr<Schema> and that variant will also take care of calling
release on the schema.  So the only time that release() needs to be called
is in the else branch.

The ArrowSchema is getting unconditionally passed to `arrow::ImportSchema`
which will take care of calling release().  So you should not call release
on the ArrowSchema.

> I have to have it like this:
> Otherwise there will be a seg fault in c++ code. It’s maybe a requirement
for DuckDB.

>From DuckDb's perspective it shouldn't matter (and it should be impossible
for DuckDb to know) how you are allocating the ArrowSchema / ArrowArray.
So this is very odd.

Is this code something runnable that you can share?
Can you run the code in gdb and show the stack trace of the segmentation
fault?

On Wed, Mar 8, 2023 at 10:57 PM Kimmo Linna <[email protected]> wrote:

> Hi,
>
> I have to have it like this:
> ArrowSchema *arrow_schema = new ArrowSchema;
> ArrowArray *arrow_array = new ArrowArray;
>
> Otherwise there will be a seg fault in c++ code. It’s maybe a requirement
> for DuckDB.
>
> Best regards,
>
> Kimmo
> --
> Kimmo Linna
> Nihtisalontie 3 as 1
> 02630  ESPOO
> [email protected]
> +358 40 590 1074
>
>
>
> On 8. Mar 2023, at 18.10, Weston Pace <[email protected]> wrote:
>
> I agree with Aldrin.  `arrow_schema` has type ArrowSchema*. You are
> passing in `&arrow_schema` which has type `ArrowSchema**` but you are
> casting it to `duckdb_arrow_schema*` so it at least compiles.  You can also
> simplify things by getting rid of some of this allocation.
>
> ```
>     ArrowSchema arrow_schema;
>     duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema
> *)&arrow_schema);
>     auto schema = arrow::ImportSchema(&arrow_schema);
>     auto output_stream = arrow::io::BufferOutputStream::Create();
>     auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream,
> *schema);
>     while (true)
>     {
>       ArrowArray arrow_array;
>       duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array
> *)&arrow_array);
>       if (arrow_array.length > 0)
>       {
>         (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch
> &)**arrow::ImportRecordBatch(&arrow_array, *schema));
>       }
>       else
>       {
>         arrow_array.release(&arrow_array);
>         break;
>       }
>     }
>     auto buffer = (*output_stream)->Finish();
> ```
>
>
> On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected]> wrote:
>
>> I might be misreading, but do you need to get the address of your object
>> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers,
>> I'm not sure you need to get their address and cast that to a pointer.
>> otherwise, I don't see anything else that stands out as problematic to me.
>>
>> You could try to check the Result objects so that you get better error
>> information before the segfault occurs; as it stands now you're blindly
>> dereferencing. The other thing that might be helpful is that I think you
>> don't have to call `new` every time. In all of these instances you are
>> allocating memory for the objects, but you could just create them directly
>> and then pass references to them. For example:
>>
>>    ArrowSchema arrow_schema;
>>    duckdb_query_arrow_schema(arrow_result, &arrow_schema);
>>    ...
>>
>> Since ArrowSchema seems to be default constructible, I believe the
>> declaration alone will do the default construction (`ArrowSchema
>> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if those
>> suggestions aren't accurate.
>>
>> Aldrin Montana
>> Computer Science PhD Student
>> UC Santa Cruz
>>
>>
>> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I noticed that my previous code worked only for one ArrowArray from
>>> DuckDB. I change a code this but this is still working only for one
>>> ArrowArray. The second “loop” causes a segmentation fault. Could someone
>>> guide me a bit? What should I change to get this work?
>>>
>>>
>>> ArrowSchema *arrow_schema = new ArrowSchema();
>>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema *)&
>>> arrow_schema);
>>> auto schema = arrow::ImportSchema(arrow_schema);
>>> auto output_stream = arrow::io::BufferOutputStream::Create();
>>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, *schema
>>> );
>>> while (true) {
>>> ArrowArray *arrow_array = new ArrowArray();
>>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array *)&
>>> arrow_array);
>>> if( arrow_array->length > 0) {
>>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) **arrow::
>>> ImportRecordBatch(arrow_array,*schema));
>>> delete arrow_array;
>>> }else{
>>> break;
>>> }
>>> }
>>> auto buffer = (*output_stream)->Finish();
>>> Best regards,
>>>
>>> Kimmo
>>>
>>> --
>>> Kimmo Linna
>>> Nihtisalontie 3 as 1
>>> 02630  ESPOO
>>> [email protected]
>>> +358 40 590 1074
>>>
>>>
>>>
>>>
>

Re: [C++][C Data][Table]ChunkedArray and Table

Reply via email to