Re: Bulk DML performance

2025-03-17 Thread Renan Alves Fonseca
Hi, Here are some observations. Em seg., 17 de mar. de 2025 às 09:19, escreveu: > > PostgreSQL has a lot of overhead per row. > > Okay, thanks. I'm not actually too worried about this since in my > scenario, each row is about 1.5 kB, so the % overhead is negligible. > > > It is probably not the

Re: Bulk DML performance

2025-03-17 Thread Greg Sabino Mullane
On Mon, Mar 17, 2025 at 4:19 AM wrote: Can you help me understand why performing 3 million lookups on a b-tree > index with all pages cached in memory takes so long? It's not the lookup, it's writing the 3 million rows (and in this particular upsert case, deleting 3 million, then inserting 3 m

Re: Bulk DML performance

2025-03-17 Thread Álvaro Herrera
On 2025-Mar-13, bill.po...@ymail.com wrote: > I need to perform a merge (INSERT ... ON CONFLICT ... DO UPDATE) on > the data, so sadly I cannot use COPY. > > I have discovered that for some reason, performing the original insert > without the ON CONFLICT statement is twice as fast as performing t

RE: Bulk DML performance

2025-03-17 Thread bill.poole
parallelized in the future. Best, Bill From: Renan Alves Fonseca Sent: Friday, 14 March 2025 5:25 AM To: bill.po...@ymail.com Cc: pgsql-performa...@postgresql.org Subject: Re: Bulk DML performance Hello, Regarding the additional time for UPDATE, you can try the following: CREATE

RE: Bulk DML performance

2025-03-17 Thread bill.poole
est4.id = generate_series This also now means that updating 3 million rows takes 4x longer than inserting those rows. Do we expect updates to be 4x slower than inserts? Regards, Bill -Original Message- From: Laurenz Albe Sent: Thursday, 13 March 2025 7:28 PM To: bill.po...@ymail

RE: Bulk DML performance

2025-03-17 Thread bill.poole
in the > database. Sadly, I cannot do that because I need all rows to be inserted in a single database transaction, which I cannot do over multiple database connections. Regards, Bill -Original Message- From: Laurenz Albe Sent: Thursday, 13 March 2025 5:21 PM To: bill.po...

Re: Bulk DML performance

2025-03-13 Thread Renan Alves Fonseca
Hello, Regarding the additional time for UPDATE, you can try the following: CREATE TABLE test3 ( id bigint PRIMARY KEY, text1 text ) WITH (fillfactor=30); See: https://www.postgresql.org/docs/17/storage-hot.html My local test gives me almost the same time for INSERT (first insert) and UPDATES

Re: Bulk DML performance

2025-03-13 Thread Laurenz Albe
On Thu, 2025-03-13 at 12:28 +0100, I wrote: > Then the best you can do is to use COPY rather than INSERT. > It will perform better (but now vastly better). Sorry, I meant "*not* vastly better". Yours, Laurenz Albe

Re: Bulk DML performance

2025-03-13 Thread Laurenz Albe
On Thu, 2025-03-13 at 18:13 +0800, bill.po...@ymail.com wrote: > > it is noteworthy that inserting 27 MB of data into a newly created table > creates > 191 MB of data including the index and 127 MB of data excluding the index. PostgreSQL has a lot of overhead per row. > > Can you help me under

Re: Bulk DML performance

2025-03-13 Thread Laurenz Albe
On Thu, 2025-03-13 at 12:05 +0800, bill.po...@ymail.com wrote: > The following INSERT .. ON CONFLICT (…) DO UPDATE statement inserts/updates > 3 million rows with only 9 bytes per row and takes about 8 seconds on first > run (to insert the rows) and about 14 seconds on subsequent runs (to update >