Re: Any standard way for min/max values per record-batch?

2021-07-19 Thread Kohei KaiGai
Hello, Let me share our trial to support the min/max statistics per record batch. https://github.com/heterodb/pg-strom/wiki/806:-Apache-Arrow-Min-Max-Statistics-Hint The latest pg2arrow supports --stat option that can specify the columns to include min/max statistics for each record batch. The st

Re: Any standard way for min/max values per record-batch?

2021-02-18 Thread Ben Kietzman
Unfortunately FieldNode is a `struct` instead of a `table`, so fields may not be added or deprecated. On Thu, Feb 18, 2021, 04:38 Antoine Pitrou wrote: > > Le 18/02/2021 à 04:37, Micah Kornfield a écrit : > > There is key-value metadata available on Message which might be able to > > work in the

Re: Any standard way for min/max values per record-batch?

2021-02-18 Thread Antoine Pitrou
Le 18/02/2021 à 04:37, Micah Kornfield a écrit : > There is key-value metadata available on Message which might be able to > work in the short term (some sort of encoded message). I think > standardizing how we store statistics per batch does make sense. > > We unfortunately can't add anything

Re: Any standard way for min/max values per record-batch?

2021-02-17 Thread Micah Kornfield
> > What is the parallel-list means? Something like: table RecordBatch { nodes: [FieldNode]; // Statistics related to the data represented by each FieldNode // This field is either length=0 or has the same length as nodes. statistics: [Statistic]; } On Wed, Feb 17, 2021 at 8:34 PM Kohei KaiGai

Re: Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
Thanks for the clarification. > There is key-value metadata available on Message which might be able to > work in the short term (some sort of encoded message). I think > standardizing how we store statistics per batch does make sense. > For example, JSON array of min/max values as a key-value me

Re: Any standard way for min/max values per record-batch?

2021-02-17 Thread Micah Kornfield
There is key-value metadata available on Message which might be able to work in the short term (some sort of encoded message). I think standardizing how we store statistics per batch does make sense. We unfortunately can't add anything to field-node without breaking compatibility. But another o

Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
Hello, Does Apache Arrow have any standard way to embed min/max values of the fields per record-batch basis? It looks FieldNode supports neither dedicated min/max attribute nor custom-metadata. https://github.com/apache/arrow/blob/master/format/Message.fbs#L28 If we embed an array of min/max valu