No problem! Hopefully you found the issue. Sent from Proton Mail for iOS On Wed, Mar 26, 2025 at 07:30, Jack Wimberley via user <[email protected]> wrote: Hi Aldrin, My apologies, I never saw this message from you, but thanks belatedly! Your guess was more or less on the mark. Best, Jack
On Tue, Mar 4, 2025 at 11:27 PM Aldrin < [email protected]> wrote: Hi Jack! I attempted the code you provided and it seemed to work for me. I put my code in a gist [1] for you to compare against your own. I don't use THROW_NOT_OK simply because I figured it wouldn't be necessary to try that as well (I assume that's either your own macro or something you can easily replace ARROW_ASSIGN_OR_RAISE with). I provide a version that uses typical arrow macros (`ARROW_ASSIGN_OR_RAISE`) as well as a version that manually gets the result/status and checks ok(). I also tried using Buffer::data() and Buffer::size() functions to replicate the fact that you use that constructor of Buffer (`Buffer::Buffer(const uint8_t* data, int64_t size)`). I didn't see any issues. So, I kind of suspect that because of the Buffer constructor you're using, maybe you somehow don't keep the Buffer data alive? So maybe most of the data is actually still there but there has been some garbage written? Not totally sure, but it seems a reasonable culprit. [1]: https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user < [email protected]> wrote: Hello all, I am attempting to serialize and then deserialize individual RecordBatch objects, using the C++ library. However, I’m getting an “Invalid” result on the deserialization end. On the serialization end, with the help of some methods THROW_NOT_OK that throw on non-OK Status and Result (and returns latter case returns the inner Value), I’m serializing using // batch is a valid std::shared_ptr<RecordBatch> auto bufferStream = THROW_NOT_OK(io::BufferOutputStream::Create(1024)); auto batchWriter = THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream, batch->schema())); auto writeStatus = THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch)); THROW_NOT_OK(batchWriter->Close()); auto batchBuffer = THROW_NOT_OK(bufferStream->Finish()); // pass this data along The size of the buffer thus created is 1800. On the other end of the channel, I try to deserialize an in-memory copy of this IPC data using // bufferPtr is a uint8_t* const location in memory and bufferSize a number of bytes auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, bufferSize); // no-copy wrap auto bufferReader = std::make_shared<io::BufferReader>(arrowBuffer); auto batchReader = THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader)); But, the last step fails, with a non-OK result with message Invalid: Expected to read 165847040 metadata bytes, but only read 1796 The metadata bytes size is way off, given the serialized RecordBatch was 1800 bytes to begin with. The number of bytes read looks about right, modulo that difference of 4. I saw some similar questions in the archives and online but the issues in them tended to be that the Close() step was missing. Other suggestions are a mismatch in the reader/writer format; I am using ones that look to me to be appropriately paired IPC stream I/O objects. Does some sort of header need to be written to the stream before the RecordBatch? Or, I did not use the second overloaded WriteRecordBatch method that takes a metadata object as the second argument, and the message mentions metadata bytes; is that relevant? Best, Jack Wimberley
signature.asc
Description: OpenPGP digital signature
