[Discuss] Provide pluggable APIs to support user customized compression codec

2020-10-18 Thread Xie, Qi
Hi, all Again as we discussed in the previous email, We are proposing an pluggable APIs to support user customized compression codec in ARROW. See proposal https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit We want to redefine the scope of the pluggable API and

Re: [C++] Arrow to ORC type conversion

2020-10-18 Thread Uwe L. Korn
This sounds reasonable from an Arrow perspective, you might want to CC the ORC list as well or ask someone there to co-review your work in the adapter. Uwe > Am 18.10.2020 um 17:24 schrieb Ying Zhou : > > Hi, > > I’m developing the adapter that converts Arrow Arrays, ChunkedArrays, >

Re: [Rust] Blog post for 2.0.0

2020-10-18 Thread Fernando Herrera
Thanks Jorge for helping me to get across the need for a user guide. The examples you used are exactly what I had in mind. It would be great if the project had a user guide similar to tokio's. We could use this guide to explain how to get started and some examples using the available crates

Re: [C++] AppendValues for numeric types with invalid slots omitted from source

2020-10-18 Thread Wes McKinney
hi Ying, the code in adapter_util.cc doesn't look right to me unless the data in liborc::ColumnVectorBatch is spaced (has placeholder bytes where there is a null). We have quite a bit of code in Parquet that deals specifically with this issue -- I'm not sure if we have a ready-made function that

[C++] AppendValues for numeric types with invalid slots omitted from source

2020-10-18 Thread Ying Zhou
Hi, Unlike Arrow in ORC when an entry is null it is only recorded in the PRESENT stream (equivalent to the validity bitmap in Arrow) but not in any DATA stream for any type including numeric types. Hence the notNull (aka PRESENT) and data buffers from ORC generally don’t have the same size.

[C++] Arrow to ORC type conversion

2020-10-18 Thread Ying Zhou
Hi, I’m developing the adapter that converts Arrow Arrays, ChunkedArrays, RecordBatches and Tables into ORC files. Given the ORC Specification and Arrow Columnar Format. Here is my current type mapping: Type::type::NA -> nulllptr Type::type::BOOL -> liborc::TypeKind::BOOLEAN

[NIGHTLY] Arrow Build Report for Job nightly-2020-10-18-0

2020-10-18 Thread Crossbow
Arrow Build Report for Job nightly-2020-10-18-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-18-0 Failed Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-10-18-0-github-centos-6-amd64 -