If you control the function that produces the vector<string>, you can avoid
all these fragmented allocations by re-using the same std::string in a loop
and reserving buffers upfront in the builder:

string_builder.Reserve(number_of_strings);
strinb_builder.ReserveData(sum_of_lengths_of_all_strings_or_an_estimate_of_that);

std::string s;
for (...) {
  s.clear();  // this doesn't deallocates s's internal buffer
  // ... populate the string s. Avoids new memory allocation if smaller
than biggest string so far.
  RETURN_NOT_OK(string_builder.Append(s));
}

--
Felipe

On Thu, May 4, 2023 at 3:09 PM Felipe Oliveira Carvalho <[email protected]>
wrote:

> std::vector<std::string>::data() returns a buffer containing pointers to
> the individual string buffers and Arrow needs a buffer with contiguous
> variable-length character data.
>
> And that is buffers[2]. buffers[1] contains the offsets for beginning and
> end of the strings in buffers[2].
>
> So yes, use the StringBuilder.
>
> --
> Felipe
>
> On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli <
> [email protected]> wrote:
>
>> Hello,
>> I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
>> The arrow::compute::SetLookupOptions takes in a datum (array of of
>> strings, in my case to search).
>>
>> I tried this
>>
>> std::vector<std::string> vec;
>> auto buffer = arrow::Buffer::Wrap(vec);
>> auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(),
>> {nullptr, buffer});
>> auto options = arrow::compute::SetLookupOptions(arrayData);
>> auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
>>
>> This is resulting in a crash.
>>
>> I tried calling arrow::MakeArray(arrayData), and that is also failing.
>>
>> But if I convert the std::vector to arrow::Array (using StringBuilder)
>> then there's no crash and I'm getting expected results.
>>
>> Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or
>> I'm missing something ?
>>
>> Thanks,
>> Surya
>>
>

Reply via email to