Jingyuan Wang created ARROW-2119:
------------------------------------

             Summary: Handle Arrow stream with zero record batch
                 Key: ARROW-2119
                 URL: https://issues.apache.org/jira/browse/ARROW-2119
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Jingyuan Wang


It looks like currently many places of the code assume that there needs to be 
at least one record batch for streaming format. Is zero-recordbatch not 
supported by design?

e.g. 
[https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
{code:none}
  public static void convert(InputStream in, OutputStream out) throws 
IOException {
    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
    try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
      VectorSchemaRoot root = reader.getVectorSchemaRoot();
      // load the first batch before instantiating the writer so that we have 
any dictionaries
      if (!reader.loadNextBatch()) {
        throw new IOException("Unable to read first record batch");
      }
      ...
{code}
Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
exception originated from 
[https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
{code:none}
Status Table::FromRecordBatches(const 
std::vector<std::shared_ptr<RecordBatch>>& batches,
                                std::shared_ptr<Table>* table) {
  if (batches.size() == 0) {
    return Status::Invalid("Must pass at least one record batch");
  }
  ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to