Hi, I need help to integrate arrow cpp in my current project. In fact I built cpp library and can call api.
What I need is that: I have a c++ project that reads data by chunks then uses some erasure code to rebuild original data. The rebuild is done in chunks , At each iteration I can access a buffer of rebuilt data. My need is to pass this data as a stream to arrow process then send the processed stream. For example if my original file is a csv and I would like to filter and save first column: file col1,col2, col3, col3 a1,b1,c1,d1 an,bn,cn,dn split to 6 chunks of equal sizes chunk1: a1,b1,c1,d1 ak,bk chunk2: ck,dk ... am,bm,cm,dm and so on. My question is how to use the right StreamReader in arrow and how this deals with in complete records( lines) at the beginning and end of each chunk ? Here a snippet of code I use : buffer_type_t res = fut.get0(); BOOST_LOG_TRIVIAL(trace) << "RawxBackendReader: Got result with buffer size: " << res.size(); std::shared_ptr<arrow::io::InputStream> input; std::shared_ptr<arrow::io::BufferReader> buffer(new arrow::io::BufferReader( reinterpret_cast<const uint8_t*>(res.get()), res.size())); input = buffer; BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get(); ArrowFilter arrow_filter = ArrowFilter(input); arrow_filter.ToCsv(); result.push_back(std::move(res)); Thank you
