Re: Loading 500m json files to database

2020-03-24 Thread Rob Sargent
On 3/24/20 11:53 AM, Kevin Brannen wrote: From: Rob Sargent Sorry if I missed it, but have we seen the size range of these json files? Not that I've seen, but that's an implementation detail for whoever is doing the work. As someone else pointed out, pick the value as needed, whether

RE: Loading 500m json files to database

2020-03-24 Thread Kevin Brannen
From: Rob Sargent > Sorry if I missed it, but have we seen the size range of these json files? Not that I've seen, but that's an implementation detail for whoever is doing the work. As someone else pointed out, pick the value as needed, whether that's 10, 100, 1000, or whatever. But issuing

Re: Loading 500m json files to database

2020-03-24 Thread Rob Sargent
On 3/24/20 11:29 AM, Kevin Brannen wrote: From: pinker it's a cloud and no plpythonu extension avaiable unfortunately You're misunderstanding him. See David's post for an example, but the point was that you can control all of this from an *external* Perl, Python, Bash, whatever program

RE: Loading 500m json files to database

2020-03-24 Thread Kevin Brannen
From: pinker > it's a cloud and no plpythonu extension avaiable unfortunately You're misunderstanding him. See David's post for an example, but the point was that you can control all of this from an *external* Perl, Python, Bash, whatever program on the command line at the shell. In

Re: Loading 500m json files to database

2020-03-24 Thread Peter J. Holzer
On 2020-03-23 17:18:45 -0700, pinker wrote: > Christopher Browne-3 wrote > > Well, you're paying for a lot of overhead in that, as you're > > establishing a psql command, connecting to a database, spawning a > > backend process, starting a transactions, committing a transaction, > > closing the

Re: Loading 500m json files to database

2020-03-23 Thread Reid Thompson
On Mon, 2020-03-23 at 03:24 -0700, pinker wrote: > [EXTERNAL SOURCE] > > > > Hi, do you have maybe idea how to make loading process faster? > > I have 500 millions of json files (1 json per file) that I need to load to > db. > My test set is "only" 1 million files. > > What I came up with now

Re: Loading 500m json files to database

2020-03-23 Thread pinker
hmm now I'm thinking maybe setting up pgbouncer in front of postgres with statement mode would help? -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Loading 500m json files to database

2020-03-23 Thread pinker
it's in a blob storage in Azure. I'm testing with 1m that I have locally -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent
> On Mar 23, 2020, at 7:11 PM, David G. Johnston > wrote: > > On Mon, Mar 23, 2020 at 3:24 AM pinker > wrote: > time for i in datafiles/*; do > psql -c "\copy json_parts(json_data) FROM $i"& > done > > Don't know whether this is faster but it does avoid spinning up

Re: Loading 500m json files to database

2020-03-23 Thread David G. Johnston
On Mon, Mar 23, 2020 at 3:24 AM pinker wrote: > time for i in datafiles/*; do > psql -c "\copy json_parts(json_data) FROM $i"& > done > Don't know whether this is faster but it does avoid spinning up a connection multiple times. #bash, linux function

Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver
On 3/23/20 5:23 PM, pinker wrote: it's a cloud and no plpythonu extension avaiable unfortunately I presume Python itself is available, so would it not be possible to create a program that concatenates the files into batches and COPY(s) that data into Postgres using the Psycopg2 COPY

Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver
On 3/23/20 5:26 PM, pinker wrote: Hi, json_parts it's just single table with 2 column: Well I misread that. Table "public.json_parts" Column | Type | Collation | Nullable |Default | Storage | Stats target |

Re: Loading 500m json files to database

2020-03-23 Thread pinker
Hi, json_parts it's just single table with 2 column: Table "public.json_parts" Column | Type | Collation | Nullable |Default | Storage | Stats target | Description

Re: Loading 500m json files to database

2020-03-23 Thread pinker
there is no indexes nor foreign keys, or any other constraints -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Loading 500m json files to database

2020-03-23 Thread pinker
it's a cloud and no plpythonu extension avaiable unfortunately -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Loading 500m json files to database

2020-03-23 Thread pinker
Ertan Küçükoğlu wrote > However, if possible, you may think of using a local physical computer to > do all uploading and after do backup/restore on cloud system. > > Compressed backup will be far less internet traffic compared to direct > data inserts. I was thinking about that but data source

Re: Loading 500m json files to database

2020-03-23 Thread pinker
Christopher Browne-3 wrote > Well, you're paying for a lot of overhead in that, as you're > establishing a psql command, connecting to a database, spawning a backend > process, starting a transactions, committing a transaction, closing the > backend > process, disconnecting from the database, and

Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver
On 3/23/20 3:24 AM, pinker wrote: Hi, do you have maybe idea how to make loading process faster? I have 500 millions of json files (1 json per file) that I need to load to db. My test set is "only" 1 million files. What I came up with now is: time for i in datafiles/*; do psql -c "\copy

Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent
On 3/23/20 4:24 AM, pinker wrote: Hi, do you have maybe idea how to make loading process faster? I have 500 millions of json files (1 json per file) that I need to load to db. My test set is "only" 1 million files. What I came up with now is: time for i in datafiles/*; do psql -c "\copy

Re: Loading 500m json files to database

2020-03-23 Thread Christopher Browne
On Mon, 23 Mar 2020 at 06:24, pinker wrote: > Hi, do you have maybe idea how to make loading process faster? > > I have 500 millions of json files (1 json per file) that I need to load to > db. > My test set is "only" 1 million files. > > What I came up with now is: > > time for i in

Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent
> On Mar 23, 2020, at 5:59 AM, Andrei Zhidenkov > wrote: > > Try to write a stored procedure (probably pl/python) that will accept an > array of JSON objects so it will be possible to load data in chunks (by > 100-1000 files) which should be faster. > >>> On 23. Mar 2020, at 12:49, Ertan

Re: Loading 500m json files to database

2020-03-23 Thread Andrei Zhidenkov
Try to write a stored procedure (probably pl/python) that will accept an array of JSON objects so it will be possible to load data in chunks (by 100-1000 files) which should be faster. > On 23. Mar 2020, at 12:49, Ertan Küçükoğlu > wrote: > > >> On 23 Mar 2020, at 13:20, pinker wrote: >>

Re: Loading 500m json files to database

2020-03-23 Thread Ertan Küçükoğlu
> On 23 Mar 2020, at 13:20, pinker wrote: > > Hi, do you have maybe idea how to make loading process faster? > > I have 500 millions of json files (1 json per file) that I need to load to > db. > My test set is "only" 1 million files. > > What I came up with now is: > > time for i in

Loading 500m json files to database

2020-03-23 Thread pinker
Hi, do you have maybe idea how to make loading process faster? I have 500 millions of json files (1 json per file) that I need to load to db. My test set is "only" 1 million files. What I came up with now is: time for i in datafiles/*; do psql -c "\copy json_parts(json_data) FROM $i"& done