[ 
https://issues.apache.org/jira/browse/ARROW-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102191#comment-17102191
 ] 

Andy Grove commented on ARROW-8737:
-----------------------------------

I was able to work around the issue by increasing a batch size from 1024 to 
4096, but seems like there is a missing bounds check in this code.

> [Rust] [Parquet] Parquet array reader panics
> --------------------------------------------
>
>                 Key: ARROW-8737
>                 URL: https://issues.apache.org/jira/browse/ARROW-8737
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>    Affects Versions: 0.17.0
>            Reporter: Andy Grove
>            Priority: Major
>
> I'm trying to read some parquet files produced by Apache Spark 3.0.0-preview2 
> and the parquet crate is panicking. It should at least fail with an Err 
> rather than panic.
> {code:java}
> thread '<unnamed>' panicked at 'index out of bounds: the len is 1024 but the 
> index is 1087', 
> /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.17.0/src/arrow/record_reader.rs:415:21
> stack backtrace:
>    0:     0x564dbc25a9d4 - 
> backtrace::backtrace::libunwind::trace::hfcd33194db0151d4
>                                at 
> /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
>    1:     0x564dbc25a9d4 - 
> backtrace::backtrace::trace_unsynchronized::hfd1904bbbd5335b5
>                                at 
> /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
>    2:     0x564dbc25a9d4 - 
> std::sys_common::backtrace::_print_fmt::h8476c57b177b254e
>                                at src/libstd/sys_common/backtrace.rs:78
>    3:     0x564dbc25a9d4 - 
> <std::sys_common::backtrace::_print::DisplayBacktrace as 
> core::fmt::Display>::fmt::h73acbc5f6d4b1044
>                                at src/libstd/sys_common/backtrace.rs:59
>    4:     0x564dbc28727c - core::fmt::write::hdf236390fbd68d3d
>                                at src/libcore/fmt/mod.rs:1069
>    5:     0x564dbc2536c3 - std::io::Write::write_fmt::h5722fa40bb2afafd
>                                at src/libstd/io/mod.rs:1532
>    6:     0x564dbc25d2d5 - 
> std::sys_common::backtrace::_print::ha468e873aada7c78
>                                at src/libstd/sys_common/backtrace.rs:62
>    7:     0x564dbc25d2d5 - 
> std::sys_common::backtrace::print::h149365a2f029de62
>                                at src/libstd/sys_common/backtrace.rs:49
>    8:     0x564dbc25d2d5 - 
> std::panicking::default_hook::{{closure}}::hb4a33f9e05934a52
>                                at src/libstd/panicking.rs:198
>    9:     0x564dbc25d012 - std::panicking::default_hook::hc4535d7b0c743abd
>                                at src/libstd/panicking.rs:218
>   10:     0x564dbc25d918 - 
> std::panicking::rust_panic_with_hook::haa34a96a6dbd5a2e
>                                at src/libstd/panicking.rs:477
>   11:     0x564dbc25d51b - rust_begin_unwind
>                                at src/libstd/panicking.rs:385
>   12:     0x564dbc285071 - core::panicking::panic_fmt::hd101a87121fa411f
>                                at src/libcore/panicking.rs:89
>   13:     0x564dbc285032 - 
> core::panicking::panic_bounds_check::ha0668dcff6357ef4
>                                at src/libcore/panicking.rs:65
>   14:     0x564dbbcdbf46 - 
> parquet::arrow::record_reader::RecordReader<T>::read_records::hc8f50faae4afaae7
>   15:     0x564dbbc4da98 - 
> <parquet::arrow::array_reader::PrimitiveArrayReader<T> as 
> parquet::arrow::array_reader::ArrayReader>::next_batch::hb4e5b687cd08ee46
>   16:     0x564dbbcca3c9 - <core::iter::adapters::Map<I,F> as 
> core::iter::traits::iterator::Iterator>::try_fold::h4206004da76eb745
>   17:     0x564dbbc51c51 - <parquet::arrow::array_reader::StructArrayReader 
> as parquet::arrow::array_reader::ArrayReader>::next_batch::hf1c89300e65c72e8
>   18:     0x564dbbcacaba - 
> <parquet::arrow::arrow_reader::ParquetRecordBatchReader as 
> arrow::record_batch::RecordBatchReader>::next_batch::ha906d7eb32c7238a
>   19:     0x564dbbbe33b8 - 
> std::sys_common::backtrace::__rust_begin_short_backtrace::hc2fd908045ecbee0
>   20:     0x564dbbb4a7ff - 
> core::ops::function::FnOnce::call_once{{vtable.shim}}::h58c848a35fea035b
>   21:     0x564dbc264f7a - <alloc::boxed::Box<F> as 
> core::ops::function::FnOnce<A>>::call_once::ha26a994a135d55de
>                                at 
> /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
>   22:     0x564dbc264f7a - <alloc::boxed::Box<F> as 
> core::ops::function::FnOnce<A>>::call_once::h677072ad3ba2806b
>                                at 
> /rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
>   23:     0x564dbc264f7a - 
> std::sys::unix::thread::Thread::new::thread_start::h7c46ce580f54dd0e
>                                at src/libstd/sys/unix/thread.rs:87
>   24:     0x7f332cf79669 - start_thread
>                                at 
> /build/glibc-t7JzpG/glibc-2.30/nptl/pthread_create.c:479
>   25:     0x7f332ce85323 - clone
>   26:                0x0 - <unknown>
> Error: DataFusionError(General("Error receiving batch: RecvError"))
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to