Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread David G. Johnston
On Tue, Dec 21, 2021 at 4:07 PM Tom Lane wrote: > Lars Bergeson writes: > > I'm running PostgreSQL under AWS Aurora, and I didn't set it up or > install > > it, so I'm not sure about the OS version. > > Oh! Aurora is not Postgres. My admittedly-not-well-informed > understanding is that they

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread Tom Lane
Lars Bergeson writes: > I'm running PostgreSQL under AWS Aurora, and I didn't set it up or install > it, so I'm not sure about the OS version. Oh! Aurora is not Postgres. My admittedly-not-well-informed understanding is that they stuck a Postgres front end on their existing storage engine, so

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread Lars Bergeson
Justin, Thanks for your continued interest. I'm running PostgreSQL under AWS Aurora, and I didn't set it up or install it, so I'm not sure about the OS version. I can't run the grep command since I don't know how to get down to the command line on the actual box running Aurora. I just connect

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread Justin Pryzby
On Tue, Dec 21, 2021 at 12:33:06AM -0500, Tom Lane wrote: > So now we have a real mystery about what is happening on Lars' > system. Those numbers can't be right. I realized Lars said it was x86_64/Linux, but I'm hoping to hear back with more details: What OS version? Is it a VM of some type ?

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
"David G. Johnston" writes: > On Monday, December 20, 2021, Tom Lane wrote: >> It would help if somebody had labeled the units of I/O Time >> ... but I'm guessing those are microsec vs. the millisec >> of the other times, because otherwise it's completely wrong. > Related to my preceding

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Tom Lane wrote: > Justin Pryzby writes: > > On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote: > >> Still taking 10X more I/O to read the smaller table. Very odd. > > > If I'm not wrong, it's even worse than that ? > > It takes 20 or 30sec to run the

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Justin Pryzby wrote: > On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote: > > ok, here are results after I did: > > set max_parallel_workers_per_gather = 0; > > > > HashAggregate (cost=1676432.13..1676432.16 rows=3 width=15) (actual >

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
Justin Pryzby writes: > On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote: >> Still taking 10X more I/O to read the smaller table. Very odd. > If I'm not wrong, it's even worse than that ? > It takes 20 or 30sec to run the query - but it says the associated I/O times > are ~500sec or

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Justin Pryzby
On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote: > ok, here are results after I did: > set max_parallel_workers_per_gather = 0; > > HashAggregate (cost=1676432.13..1676432.16 rows=3 width=15) (actual > time=19908.343..19908.345 rows=5 loops=1) > I/O Timings: read=532369.898 >

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Lars Bergeson
ok, here are results after I did: set max_parallel_workers_per_gather = 0; no nulls table is 11.462 GB: QUERY PLAN HashAggregate (cost=1676432.13..1676432.16 rows=3 width=15) (actual time=19908.343..19908.345 rows=5 loops=1) Group Key: roys_creation_user Batches: 1 Memory Usage: 24kB I/O

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
Lars Bergeson writes: > What is it about null values in the table that slows down the full table > scan? If a row has any nulls, then it contains a "nulls bitmap" [1] that says which columns are null, and that bitmap has to be consulted while walking through the row contents. So the most

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Lars Bergeson wrote: > > What is it about null values in the table that slows down the full table > scan? > > If I populate blank/zero for all of the unused values in columns that are > NULLable, the query is fast again. So just defining the columns as NULLable >

Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Lars Bergeson
First of all, here is the version of PostgreSQL I'm using: PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit I'm new to PostgreSQL, and I'm deciding if I should make columns in my database nullable or not. I have no need to distinguish between