That was it, thanks much David! Sitaraman From: David Li <[email protected]> Date: Monday, February 6, 2023 at 12:23 PM To: dl <[email protected]> Subject: Re: Missing magic number. ***** EXTERNAL EMAIL ***** >From a quick glance, it seems one code path uses the **stream** writer not the >file writer. See this documentation section and the next for the difference >[1].
[1]: https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Farrow.apache.org%2Fdocs%2Fformat%2FColumnar.html%23ipc-streaming-format&data=05%7C01%7Cvilayannur.sitaraman%40hitachivantara.com%7C3c2de509504c43f5b61f08db08800601%7C18791e1761594f52a8d4de814ca8284a%7C0%7C0%7C638113118186136475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GeIKAAiwSGnkI00U3ohgm7BOjhfjBer%2BrOdzwgHpw%2FM%3D&reserved=0> -David On Mon, Feb 6, 2023, at 14:58, Vilayannur Sitaraman wrote: Hi, I am trying out Arrow Flight and I have these two programs to write and read from file. public class WriteToBuffer { public static void main(String[] args) { WriteToBuffer wb = new WriteToBuffer(); wb.execute1(); } public void execute1(){ try (BufferAllocator rootAllocator = new RootAllocator()) { Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null); Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null); Schema schemaPerson = new Schema(asList(name, age)); try( VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, rootAllocator) ){ VarCharVector nameVector = (VarCharVector) vectorSchemaRoot.getVector("name"); nameVector.allocateNew(3); nameVector.set(0, "David".getBytes()); nameVector.set(1, "Gladis".getBytes()); nameVector.set(2, "Juan".getBytes()); IntVector ageVector = (IntVector) vectorSchemaRoot.getVector("age"); ageVector.allocateNew(3); ageVector.set(0, 10); ageVector.set(1, 20); ageVector.set(2, 30); vectorSchemaRoot.setRowCount(3); File file = new File("streaming_to_file.arrow"); try ( FileOutputStream fileOutputStream = new FileOutputStream(file); ArrowStreamWriter writer = new ArrowStreamWriter(vectorSchemaRoot, null, fileOutputStream.getChannel()) ){ writer.start(); System.out.println("Writing Batch"); writer.writeBatch(); System.out.println("Number of rows written: " + vectorSchemaRoot.getRowCount()); writer.end(); } catch (IOException e) { e.printStackTrace(); } } } } public ArrowFileWriter execute() { try ( BufferAllocator allocator = new RootAllocator()) { Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null); Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null); Schema schemaPerson = new Schema(asList(name, age)); try ( VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, allocator) ) { VarCharVector nameVector = (VarCharVector) vectorSchemaRoot.getVector("name"); nameVector.allocateNew(3); nameVector.set(0, "David".getBytes()); nameVector.set(1, "Gladis".getBytes()); nameVector.set(2, "Juan".getBytes()); IntVector ageVector = (IntVector) vectorSchemaRoot.getVector("age"); ageVector.allocateNew(3); ageVector.set(0, 10); ageVector.set(1, 20); ageVector.set(2, 30); vectorSchemaRoot.setRowCount(3); try ( ByteArrayOutputStream out = new ByteArrayOutputStream(); ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, Channels.newChannel(out)) ) { writer.start(); writer.writeBatch(); System.out.println("Record batches written: " + writer.getRecordBlocks().size() + ". Number of rows written: " + vectorSchemaRoot.getRowCount()); return writer; } catch (IOException e) { e.printStackTrace(); } } } return null; } } public class ReadFromBuffer { public static void main(String[] args) { ReadFromBuffer rb = new ReadFromBuffer(); rb.execute1(); } public void execute1(){ File file = new File("streaming_to_file.arrow"); try( BufferAllocator rootAllocator = new RootAllocator(); FileInputStream fileInputStream = new FileInputStream(file); ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), rootAllocator) ){ System.out.println("Record batches in file: " + reader.getRecordBlocks().size()); for (ArrowBlock arrowBlock : reader.getRecordBlocks()) { reader.loadRecordBatch(arrowBlock); VectorSchemaRoot vectorSchemaRootRecover = reader.getVectorSchemaRoot(); System.out.print(vectorSchemaRootRecover.contentToTSVString()); } } catch (IOException e) { e.printStackTrace(); } } public void execute() { //Path path = Paths.get("./thirdpartydeps/arrowfiles/random_access.arrow"); Path path = Paths.get("streaming_to_file.arrow"); try ( BufferAllocator rootAllocator = new RootAllocator(); ArrowFileReader reader = new ArrowFileReader(new SeekableReadChannel(new ByteArrayReadableSeekableByteChannel( Files.readAllBytes(path))), rootAllocator) ) { //System.out.println("Record batches in file: " + reader.getRecordBlocks().size()); while (reader.loadNextBatch()) { for (ArrowBlock arrowBlock : reader.getRecordBlocks()) { reader.loadRecordBatch(arrowBlock); VectorSchemaRoot vectorSchemaRootRecover = reader.getVectorSchemaRoot(); System.out.print(vectorSchemaRootRecover.contentToTSVString()); } } } catch (IOException e) { e.printStackTrace(); } } } The Write program successfully writes the arrow file but when I execute the read program I get this error… (PyGDev1) C02G35CWMD6R:scala vsitaraman$ java -cp ./jars/chapter2-assembly-1.0.jar main.java.chapter2.ReadFromBuffer ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... Exception in thread "main" org.apache.arrow.vector.ipc.InvalidArrowFileException: missing Magic number [0, 0, -1, -1, -1, -1, 0, 0, 0, 0] at org.apache.arrow.vector.ipc.ArrowFileReader.readSchema(ArrowFileReader.java:98) at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:185) at org.apache.arrow.vector.ipc.ArrowFileReader.initialize(ArrowFileReader.java:120) at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:176) at org.apache.arrow.vector.ipc.ArrowFileReader.getRecordBlocks(ArrowFileReader.java:183) at main.java.chapter2.ReadFromBuffer.execute1(ReadFromBuffer.java:32) at main.java.chapter2.ReadFromBuffer.main(ReadFromBuffer.java:22) The arrow file that was written is as follows: 00000000: ffff ffff c800 0000 1000 0000 0000 0a00 ................ 00000010: 0e00 0600 0d00 0800 0a00 0000 0000 0400 ................ 00000020: 1000 0000 0001 0a00 0c00 0000 0800 0400 ................ 00000030: 0a00 0000 0800 0000 0800 0000 0000 0000 ................ 00000040: 0200 0000 5800 0000 0400 0000 c2ff ffff ....X........... 00000050: 1400 0000 1400 0000 1c00 0000 0000 0201 ................ 00000060: 2000 0000 0000 0000 0000 0000 0800 0c00 ............... 00000070: 0800 0700 0800 0000 0000 0001 2000 0000 ............ ... 00000080: 0300 0000 6167 6500 0000 1200 1800 1400 ....age......... 00000090: 1300 1200 0c00 0000 0800 0400 1200 0000 ................ 000000a0: 1400 0000 1400 0000 1800 0000 0000 0501 ................ 000000b0: 1400 0000 0000 0000 0000 0000 0400 0400 ................ 000000c0: 0400 0000 0400 0000 6e61 6d65 0000 0000 ........name.... 000000d0: ffff ffff c800 0000 1400 0000 0000 0000 ................ 000000e0: 0c00 1600 0e00 1500 1000 0400 0c00 0000 ................ 8 lines filtered What could I be missing. Thanks Sitaraman
