To give you a bit of overview that you may be missing, in order of abstraction 
(high to low):

-   Datum is like a wrapper that provides union semantics, in the C sense. For 
example, it contains an Array or a ChunkedArray or a Table, etc. but one and 
only one of them.
    

-   Array is like an interface and it stores data in ArrayData
-   ArrayData is like a container that owns data (it is responsible for 
releasing the data) and provides functions to interact with that data

-   Buffer is how the data is stored, but it is used for the values, for 
pointers into the values, and for a bitmap which indicates which values are 
null (I did not describe these in any particular order)
    



I didn't find a good spot in the documentation that mentions this, but [1] 
shows the types that you can/should put into Datum. So, compute functions 
typically expect Arrays (or something that can be wrapped in Datum); ArrayData 
is a lower level of abstraction than they're expecting.


[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/datum.h#L54




# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene


Sent with Proton Mail secure email.

------- Original Message -------
On Thursday, May 4th, 2023 at 11:09, Felipe Oliveira Carvalho 
<[email protected]> wrote:


> std::vector<std::string>::data() returns a buffer containing pointers to the 
> individual string buffers and Arrow needs a buffer with contiguous 
> variable-length character data.
> And that is buffers[2]. buffers[1] contains the offsets for beginning and end 
> of the strings in buffers[2].
> So yes, use the StringBuilder.
> 

> --
> Felipe
> 

> On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli 
> <[email protected]> wrote:
> 

> > Hello,
> > I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
> > The arrow::compute::SetLookupOptions takes in a datum (array of of strings, 
> > in my case to search).
> > 

> > I tried this
> > 

> > std::vector<std::string> vec;
> > auto buffer = arrow::Buffer::Wrap(vec);
> > auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(), 
> > {nullptr, buffer});
> > auto options = arrow::compute::SetLookupOptions(arrayData);
> > auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
> > 

> > This is resulting in a crash.
> > 

> > I tried calling arrow::MakeArray(arrayData), and that is also failing.
> > 

> > But if I convert the std::vector to arrow::Array (using StringBuilder) then 
> > there's no crash and I'm getting expected results.
> > 

> > Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or 
> > I'm missing something ?
> > 

> > Thanks,
> > Surya

Attachment: publickey - [email protected] - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to