[ https://issues.apache.org/jira/browse/ARROW-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102191#comment-17102191 ]
Andy Grove commented on ARROW-8737: ----------------------------------- I was able to work around the issue by increasing a batch size from 1024 to 4096, but seems like there is a missing bounds check in this code. > [Rust] [Parquet] Parquet array reader panics > -------------------------------------------- > > Key: ARROW-8737 > URL: https://issues.apache.org/jira/browse/ARROW-8737 > Project: Apache Arrow > Issue Type: Bug > Components: Rust > Affects Versions: 0.17.0 > Reporter: Andy Grove > Priority: Major > > I'm trying to read some parquet files produced by Apache Spark 3.0.0-preview2 > and the parquet crate is panicking. It should at least fail with an Err > rather than panic. > {code:java} > thread '<unnamed>' panicked at 'index out of bounds: the len is 1024 but the > index is 1087', > /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.17.0/src/arrow/record_reader.rs:415:21 > stack backtrace: > 0: 0x564dbc25a9d4 - > backtrace::backtrace::libunwind::trace::hfcd33194db0151d4 > at > /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86 > 1: 0x564dbc25a9d4 - > backtrace::backtrace::trace_unsynchronized::hfd1904bbbd5335b5 > at > /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66 > 2: 0x564dbc25a9d4 - > std::sys_common::backtrace::_print_fmt::h8476c57b177b254e > at src/libstd/sys_common/backtrace.rs:78 > 3: 0x564dbc25a9d4 - > <std::sys_common::backtrace::_print::DisplayBacktrace as > core::fmt::Display>::fmt::h73acbc5f6d4b1044 > at src/libstd/sys_common/backtrace.rs:59 > 4: 0x564dbc28727c - core::fmt::write::hdf236390fbd68d3d > at src/libcore/fmt/mod.rs:1069 > 5: 0x564dbc2536c3 - std::io::Write::write_fmt::h5722fa40bb2afafd > at src/libstd/io/mod.rs:1532 > 6: 0x564dbc25d2d5 - > std::sys_common::backtrace::_print::ha468e873aada7c78 > at src/libstd/sys_common/backtrace.rs:62 > 7: 0x564dbc25d2d5 - > std::sys_common::backtrace::print::h149365a2f029de62 > at src/libstd/sys_common/backtrace.rs:49 > 8: 0x564dbc25d2d5 - > std::panicking::default_hook::{{closure}}::hb4a33f9e05934a52 > at src/libstd/panicking.rs:198 > 9: 0x564dbc25d012 - std::panicking::default_hook::hc4535d7b0c743abd > at src/libstd/panicking.rs:218 > 10: 0x564dbc25d918 - > std::panicking::rust_panic_with_hook::haa34a96a6dbd5a2e > at src/libstd/panicking.rs:477 > 11: 0x564dbc25d51b - rust_begin_unwind > at src/libstd/panicking.rs:385 > 12: 0x564dbc285071 - core::panicking::panic_fmt::hd101a87121fa411f > at src/libcore/panicking.rs:89 > 13: 0x564dbc285032 - > core::panicking::panic_bounds_check::ha0668dcff6357ef4 > at src/libcore/panicking.rs:65 > 14: 0x564dbbcdbf46 - > parquet::arrow::record_reader::RecordReader<T>::read_records::hc8f50faae4afaae7 > 15: 0x564dbbc4da98 - > <parquet::arrow::array_reader::PrimitiveArrayReader<T> as > parquet::arrow::array_reader::ArrayReader>::next_batch::hb4e5b687cd08ee46 > 16: 0x564dbbcca3c9 - <core::iter::adapters::Map<I,F> as > core::iter::traits::iterator::Iterator>::try_fold::h4206004da76eb745 > 17: 0x564dbbc51c51 - <parquet::arrow::array_reader::StructArrayReader > as parquet::arrow::array_reader::ArrayReader>::next_batch::hf1c89300e65c72e8 > 18: 0x564dbbcacaba - > <parquet::arrow::arrow_reader::ParquetRecordBatchReader as > arrow::record_batch::RecordBatchReader>::next_batch::ha906d7eb32c7238a > 19: 0x564dbbbe33b8 - > std::sys_common::backtrace::__rust_begin_short_backtrace::hc2fd908045ecbee0 > 20: 0x564dbbb4a7ff - > core::ops::function::FnOnce::call_once{{vtable.shim}}::h58c848a35fea035b > 21: 0x564dbc264f7a - <alloc::boxed::Box<F> as > core::ops::function::FnOnce<A>>::call_once::ha26a994a135d55de > at > /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034 > 22: 0x564dbc264f7a - <alloc::boxed::Box<F> as > core::ops::function::FnOnce<A>>::call_once::h677072ad3ba2806b > at > /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034 > 23: 0x564dbc264f7a - > std::sys::unix::thread::Thread::new::thread_start::h7c46ce580f54dd0e > at src/libstd/sys/unix/thread.rs:87 > 24: 0x7f332cf79669 - start_thread > at > /build/glibc-t7JzpG/glibc-2.30/nptl/pthread_create.c:479 > 25: 0x7f332ce85323 - clone > 26: 0x0 - <unknown> > Error: DataFusionError(General("Error receiving batch: RecvError")) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)