No problem! Hopefully you found the issue.
 Sent from Proton Mail for iOS 
On Wed, Mar 26, 2025 at 07:30, Jack Wimberley via user 
<[email protected]> wrote:   Hi Aldrin,
   
    My apologies, I never saw this message from you, but thanks belatedly! Your 
guess was more or less on the mark.
    
    Best,
    
    Jack
 



   On Tue, Mar 4, 2025 at 11:27 PM Aldrin <
  [email protected]> wrote:
  
       Hi Jack!
       
       I attempted the code you provided and it seemed to work for me. I put my 
code in a gist [1] for you to compare against your own. I don't use    
THROW_NOT_OK​ simply because I figured it wouldn't be necessary to try that as 
well (I assume that's either your own macro or something you can easily replace 
   ARROW_ASSIGN_OR_RAISE​ with).
       
       I provide a version that uses typical arrow macros 
(`ARROW_ASSIGN_OR_RAISE`) as well as a version that manually gets the 
result/status and checks    ok()​.
       
       I also tried using    Buffer::data()​ and    Buffer::size()​ functions 
to replicate the fact that you use that constructor of    Buffer​ 
(`Buffer::Buffer(const uint8_t* data, int64_t size)`).
       
       I didn't see any issues. So, I kind of suspect that because of the 
Buffer constructor you're using, maybe you somehow don't keep the Buffer data 
alive? So maybe    most​ of the data is actually still there but there has been 
some garbage written? Not totally sure, but it seems a reasonable culprit.
       
       [1]:    https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f
   
       
                   
             # ------------------------------

             # Aldrin

             
             https://github.com/drin/
     
             https://gitlab.com/octalene
     
             https://keybase.io/octalene

                         
        On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user <
   [email protected]> wrote:
                        Hello all,
              I am attempting to serialize and then deserialize individual 
RecordBatch objects, using the C++ library. However, I’m getting an “Invalid” 
result on the deserialization end.
              On the serialization end, with the help of some methods 
THROW_NOT_OK that throw on non-OK Status and Result (and returns latter case 
returns the inner Value), I’m serializing using
                    // batch is a valid std::shared_ptr<RecordBatch>
             auto bufferStream = 
THROW_NOT_OK(io::BufferOutputStream::Create(1024));
             auto batchWriter = 
THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream, batch->schema()));
             auto writeStatus = 
THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch));
             THROW_NOT_OK(batchWriter->Close());
             auto batchBuffer = THROW_NOT_OK(bufferStream->Finish());
                    // pass this data along
              The size of the buffer thus created is 1800. On the other end of 
the channel, I try to deserialize an in-memory copy of this IPC data using
                           // bufferPtr is a uint8_t* const location in memory 
and bufferSize a number of bytes
             auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, 
bufferSize); // no-copy wrap
             auto bufferReader = 
std::make_shared<io::BufferReader>(arrowBuffer);
             auto batchReader = 
THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader));
                     But, the last step fails, with a non-OK result with message
                    Invalid: Expected to read 165847040 metadata bytes, but 
only read 1796
                     The metadata bytes size is way off, given the serialized 
RecordBatch was 1800 bytes to begin with. The number of bytes read looks about 
right, modulo that difference of 4. I saw some similar questions in the 
archives and online but the issues in them tended to be that the Close() step 
was missing. Other suggestions are a mismatch in the reader/writer format; I am 
using ones that look to me to be appropriately paired IPC stream I/O objects. 
Does some sort of header need to be written to the stream before the 
RecordBatch? Or, I did not use the second overloaded WriteRecordBatch method 
that takes a metadata object as the second argument, and the message mentions 
metadata bytes; is that relevant?
              Best,
              Jack Wimberley

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to