Re: New Copy Formats - avro/orc/parquet

2018-02-12 Thread Tom Lane
Magnus Hagander writes: > +1. And bonus points if an API can also be defined so such an extension > parsing also becomes useful to file_fdw automatically (or at least > optionally). Hm, well, file_fdw already goes through COPY FROM, so it seems like it'd almost just work.

Re: New Copy Formats - avro/orc/parquet

2018-02-12 Thread Magnus Hagander
On Sun, Feb 11, 2018 at 11:48 PM, Tom Lane wrote: > Andres Freund writes: > > So, I think making COPY extensible would be quite beneficial. I'm > > however quite doubtful that we want to add core code to handle all of > > the above. I think we should make

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Tom Lane
Andres Freund writes: > On February 11, 2018 2:48:13 PM PST, Tom Lane wrote: >> (Any such patch should manage >> to turn COPY-CSV into an extension, at least so far as copy.c is >> concerned, even if we don't package it as one.) > Yea, I was thinking we

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
On February 11, 2018 2:48:13 PM PST, Tom Lane wrote: > (Any such patch should manage >to turn COPY-CSV into an extension, at least so far as copy.c is >concerned, even if we don't package it as one.) Yea, I was thinking we should move all three (default, csv, binary)

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Tom Lane
Andres Freund writes: > So, I think making COPY extensible would be quite beneficial. I'm > however quite doubtful that we want to add core code to handle all of > the above. I think we should make the COPY input/output formatting > extensible by extensions. +1. I can't see

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 22:19, Adrian Klaver écrivait : > On 02/11/2018 12:57 PM, Nicolas Paris wrote: > > Le 11 févr. 2018 à 21:53, Andres Freund écrivait : > > > On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote: > > > > I have also the storage and network transfers overhead in mind: > > > > All

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Adrian Klaver
On 02/11/2018 12:57 PM, Nicolas Paris wrote: Le 11 févr. 2018 à 21:53, Andres Freund écrivait : On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote: I have also the storage and network transfers overhead in mind: All those new formats are compressed; this is not true for current postgres BINARY

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 21:53, Andres Freund écrivait : > On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote: > > I have also the storage and network transfers overhead in mind: > > All those new formats are compressed; this is not true for current > > postgres BINARY format and obviously text based

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote: > I have also the storage and network transfers overhead in mind: > All those new formats are compressed; this is not true for current > postgres BINARY format and obviously text based format. By experience, > the binary format is 10 to 30% larger

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 21:03, Andres Freund écrivait : > > > On February 11, 2018 12:00:12 PM PST, Nicolas Paris > wrote: > >> > That is true, but the question is how significant the overhead is. > >If > >> > it's 50% then reducing it would make perfect sense. If it's 1% then

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
On February 11, 2018 12:00:12 PM PST, Nicolas Paris wrote: >> > That is true, but the question is how significant the overhead is. >If >> > it's 50% then reducing it would make perfect sense. If it's 1% then >no >> > one if going to be bothered by it. >> >> I think it's

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
> > That is true, but the question is how significant the overhead is. If > > it's 50% then reducing it would make perfect sense. If it's 1% then no > > one if going to be bothered by it. > > I think it's pretty clear that it's going to be way way much more than > 1%. Good news but not sure to

Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
Hi, On 2018-02-10 18:21:37 +0100, Tomas Vondra wrote: > That is true, but the question is how significant the overhead is. If > it's 50% then reducing it would make perfect sense. If it's 1% then no > one if going to be bothered by it. I think it's pretty clear that it's going to be way way much

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra
On 02/10/2018 04:30 PM, Nicolas Paris wrote: >>> I d'found useful to be able to import/export from postgres to those modern >>> data >>> formats: >>> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html) >>> - parquet (c++ writer=https://github.com/apache/parquet-cpp) >>> - orc

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra
On 02/10/2018 04:38 PM, David G. Johnston wrote: > On Saturday, February 10, 2018, Nicolas Paris > wrote: > > Hello > > I d'found useful to be able to import/export from postgres to those > modern data > formats: > - avro (c

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread David G. Johnston
On Saturday, February 10, 2018, Nicolas Paris wrote: > Hello > > I d'found useful to be able to import/export from postgres to those modern > data > formats: > - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html) > - parquet (c++

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Nicolas Paris
> > I d'found useful to be able to import/export from postgres to those modern > > data > > formats: > > - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html) > > - parquet (c++ writer=https://github.com/apache/parquet-cpp) > > - orc (all writers=https://github.com/apache/orc) > >

New Copy Formats - avro/orc/parquet

2018-02-10 Thread Nicolas Paris
Hello I d'found useful to be able to import/export from postgres to those modern data formats: - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html) - parquet (c++ writer=https://github.com/apache/parquet-cpp) - orc (all writers=https://github.com/apache/orc) Something like :