Make COPY extendable in order to support Parquet and other formats

2022-06-20 Thread Aleksander Alekseev
Hi hackers, In several conversations I had recently with colleagues it was pointed out that it would be great if PostgreSQL supported COPY to/from Parquet and other formats. I've found a corresponding discussion [1] on pgsql-general@. The consensus reached back in 2018 seems to be that this should

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-21 Thread Ashutosh Bapat
On Mon, Jun 20, 2022 at 8:35 PM Aleksander Alekseev wrote: > > I would like to invest some time into providing a corresponding patch > for the core and implementing "pg_copy_parquet" extension as a > practical example, and yet another, a bit simpler, extension as an API > usage example for the cor

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-21 Thread Aleksander Alekseev
Hi Ashutosh, > An extension just for COPY to/from parquet looks limited in > functionality. Shouldn't this be viewed as an FDW or Table AM support > for parquet or other formats? Of course the later is much larger in > scope compared to the first one. But there may already be efforts > underway >

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Ashutosh Bapat
On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev wrote: > > In other words, personally I'm unaware of use cases when somebody > needs a complete read/write FDW or TableAM implementation for formats > like Parquet, ORC, etc. Also to my knowledge they are not particularly > optimized for this. >

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Aleksander Alekseev
Hi Ashutosh, > IIUC, you want extensibility in FORMAT argument to COPY command > https://www.postgresql.org/docs/current/sql-copy.html. Where the > format is pluggable. That seems useful. > Another option is to dump the data in csv format but use external > utility to convert csv to parquet or wha

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Andres Freund
Hi, On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote: > On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev > wrote: > > > > > In other words, personally I'm unaware of use cases when somebody > > needs a complete read/write FDW or TableAM implementation for formats > > like Parquet, ORC, etc.

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Tom Lane
Andres Freund writes: > On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote: >> IIUC, you want extensibility in FORMAT argument to COPY command >> https://www.postgresql.org/docs/current/sql-copy.html. Where the >> format is pluggable. That seems useful. > Agreed. Ditto. > I suspect that we'd fi

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-23 Thread Aleksander Alekseev
Andres, Tom, > > I suspect that we'd first need a patch to refactor the existing copy code a > > good bit to clean things up. After that it hopefully will be possible to > > plug > > in a new format without being too intrusive. > > I think that step 1 ought to be to convert the existing formats i

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-23 Thread Andres Freund
Hi, On 2022-06-23 11:38:29 +0300, Aleksander Alekseev wrote: > > I know little about parquet - can it support FROM STDIN efficiently? > > Parquet is a compressed binary format with data grouped by columns > [1]. I wouldn't assume that this is a primary use case for this > particular format. IMO

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-24 Thread Andrew Dunstan
On 2022-06-23 Th 21:45, Andres Freund wrote: > Hi, > > On 2022-06-23 11:38:29 +0300, Aleksander Alekseev wrote: >>> I know little about parquet - can it support FROM STDIN efficiently? >> Parquet is a compressed binary format with data grouped by columns >> [1]. I wouldn't assume that this is a p

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-24 Thread Aleksander Alekseev
Hi Andrew, > > IMO decent COPY FROM / TO STDIN support is crucial, because otherwise you > > can't do COPY from/to a client. Which would make the feature unusable for > > anybody not superuser, including just about all users of hosted PG. > > > > +1 > > Note that Parquet puts the metadata at the e