To give you a bit of overview that you may be missing, in order of abstraction (high to low):
- Datum is like a wrapper that provides union semantics, in the C sense. For
example, it contains an Array or a ChunkedArray or a Table, etc. but one and
only one of them.
- Array is like an interface and it stores data in ArrayData
- ArrayData is like a container that owns data (it is responsible for
releasing the data) and provides functions to interact with that data
- Buffer is how the data is stored, but it is used for the values, for
pointers into the values, and for a bitmap which indicates which values are
null (I did not describe these in any particular order)
I didn't find a good spot in the documentation that mentions this, but [1]
shows the types that you can/should put into Datum. So, compute functions
typically expect Arrays (or something that can be wrapped in Datum); ArrayData
is a lower level of abstraction than they're expecting.
[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/datum.h#L54
# ------------------------------
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
Sent with Proton Mail secure email.
------- Original Message -------
On Thursday, May 4th, 2023 at 11:09, Felipe Oliveira Carvalho
<[email protected]> wrote:
> std::vector<std::string>::data() returns a buffer containing pointers to the
> individual string buffers and Arrow needs a buffer with contiguous
> variable-length character data.
> And that is buffers[2]. buffers[1] contains the offsets for beginning and end
> of the strings in buffers[2].
> So yes, use the StringBuilder.
>
> --
> Felipe
>
> On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli
> <[email protected]> wrote:
>
> > Hello,
> > I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
> > The arrow::compute::SetLookupOptions takes in a datum (array of of strings,
> > in my case to search).
> >
> > I tried this
> >
> > std::vector<std::string> vec;
> > auto buffer = arrow::Buffer::Wrap(vec);
> > auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(),
> > {nullptr, buffer});
> > auto options = arrow::compute::SetLookupOptions(arrayData);
> > auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
> >
> > This is resulting in a crash.
> >
> > I tried calling arrow::MakeArray(arrayData), and that is also failing.
> >
> > But if I convert the std::vector to arrow::Array (using StringBuilder) then
> > there's no crash and I'm getting expected results.
> >
> > Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or
> > I'm missing something ?
> >
> > Thanks,
> > Surya
publickey - [email protected] - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
