Re: Writing very large rowgroups to Apache Parquet

2020-07-14 Thread Roman Karlstetter
the footer. Odds are you would do better than os swapping and take > > memory requirements down to page size times field count. > > > > In s3 I believe you could do this via a multipart upload and entirely > skip > > the second step. I don't know of any implementa

Re: Writing very large rowgroups to Apache Parquet

2020-07-10 Thread Roman Karlstetter
t > substitute a customized memory pool on WriterProperties? > > Thanks, > Micah > > [1] > > https://github.com/apache/arrow/blob/5602c459eb8773b6be8059b1b118175e9f16b7a3/cpp/src/parquet/properties.h#L447 > > On Thu, Jul 9, 2020 at 8:35 AM Roman Karlstetter < > roman.karlstet...@gmail.com>

Writing very large rowgroups to Apache Parquet

2020-07-09 Thread Roman Karlstetter
Hi everyone, since some time now, parquet::ParquetFileWriter has the option to create buffered rowgroups with AppendBufferedRowGroup(), which basically gives you the possibility to write to columns in any order you like (in contrast to the former only possible way of writing one column after the

AW: Support for TIMESTAMP_NANOS in parquet-cpp

2018-11-13 Thread Roman Karlstetter
are the implications for backwards compatibility and haven't had time to look in detail at what needs to be done since the new metadata structure was added to the Thrift definition - Wes On Mon, Nov 12, 2018 at 4:31 AM Roman Karlstetter wrote: > > I've had the chance t

AW: Support for TIMESTAMP_NANOS in parquet-cpp

2018-11-12 Thread Roman Karlstetter
) reading from and b) writing to parquet. There seem to be some writer settings, all related to timestamp precision properties. Is there any advise someone of you can give me in that regard? Thanks, Roman Von: Roman Karlstetter Gesendet: Freitag, 9. November 2018 08:38 An: dev@arrow.apache.org

AW: Support for TIMESTAMP_NANOS in parquet-cpp

2018-11-08 Thread Roman Karlstetter
her > questions, it really depends on whether there is a member of the > Parquet community who will do the work. Patches that implement any > released functionality in the Parquet format specification are > welcome. > > Thanks > Wes > On Thu, Oct 18, 2018 at 10:59 AM Roman Ka

Support for TIMESTAMP_NANOS in parquet-cpp

2018-10-18 Thread Roman Karlstetter
Hi everyone, in parquet-format, there is now support for TIMESTAMP_NANOS: https://github.com/apache/parquet-format/pull/102 For parquet-cpp, this is not yet supported. I have a few questions now: • is there an overview of what release of parquet-format is currently fully support in parquet-cpp