Re: Why could different data in a table be processed with different performance?

2018-10-10 Thread Vladimir Ryabtsev
FYI, posting an intermediate update on the issue. I disabled index scans to keep existing order, and copied part of the "slow" range into another table (3M rows in 2.2 GB table + 17 GB toast). I was able to reproduce slow readings from this copy. Then I performed CLUSTER of the copy using PK and

Re: Why could different data in a table be processed with different performance?

2018-10-01 Thread Fabio Pardi
On 28/09/18 21:51, Vladimir Ryabtsev wrote: > > That means, if your block size was bigger, then you would have bigger space > > allocated for one single record. > But if I INSERT second, third ... hundredth record in the table, the size > remains 8K. > So my point is that if one decides to

Re: Why could different data in a table be processed with different performance?

2018-09-28 Thread Fabio Pardi
On 28/09/18 11:56, Vladimir Ryabtsev wrote: > > > It could affect space storage, for the smaller blocks. > But at which extent? As I understand it is not something about "alignment" to > block size for rows? Is it only low-level IO thing with datafiles? > Maybe 'for the smaller blocks' was

Re: Why could different data in a table be processed with different performance?

2018-09-28 Thread Vladimir Ryabtsev
> Does your LVM have readahead > ramped up ? Try lvchange -r 65536 data/postgres (or similar). Changed this from 256 to 65536. If it is supposed to take effect immediately (no server reboot or other changes), then I've got no changes in performance. No at all. Vlad

Re: Why could different data in a table be processed with different performance?

2018-09-27 Thread Fabio Pardi
On 09/26/2018 07:15 PM, Vladimir Ryabtsev wrote: >> Since you have a very big toast table, given you are using spinning > disks, I think that increasing the block size will bring benefits. > But will it worsen caching? I will have lesser slots in cache. Also will > it affect required storage

Re: Why could different data in a table be processed with different performance?

2018-09-26 Thread Vladimir Ryabtsev
> Since you have a very big toast table, given you are using spinning disks, I think that increasing the block size will bring benefits. But will it worsen caching? I will have lesser slots in cache. Also will it affect required storage space? >> consecutive runs with SAME parameters do NOT hit

Re: Why could different data in a table be processed with different performance?

2018-09-25 Thread Vladimir Ryabtsev
> 1) Which file system are you using? >From Linux's view it's ext4. Real vmdx file on Hyper-V is stored on NTFS, as far as I know. > 2) What is the segment layout of the LVM PVs and LVs? I am a bit lost with it. Is that what you are asking about? master: # pvs --segments PV VG

Re: Why could different data in a table be processed with different performance?

2018-09-25 Thread Fabio Pardi
On 25/09/18 00:28, Vladimir Ryabtsev wrote: > > > it is not unusual to have 1GB cache or more...  and do not forget to drop > > the cache between tests + do a sync > I conducted several long runs of dd, so I am sure that this numbers are > fairly correct. However, what worries me is that I

Re: Why could different data in a table be processed with different performance?

2018-09-25 Thread Gasper Zejn
Hi, Vladimir, Reading the whole thread it seems you should look deeper into IO subsystem. 1) Which file system are you using? 2) What is the segment layout of the LVM PVs and LVs? See https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/report_object_selection.html how to check.

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 05:59:12PM -0700, Vladimir Ryabtsev wrote: > > This seems significant..it means the heap was probably written in > backwards > order relative to the IDs, and the OS readahead is ineffective when index > scanning across a range of IDs. > But again, why is it different for

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> If it does an index scan, I think that will badly fail to keep the same order of heap TIDs - it'll be inserting rows in ID order rather than in (I guess) reverse ID order. According to the plan, it's gonna be seq. scan with filter. Vlad

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> did you try either 1) forcing a bitmap scan (of only one index), to force the heap reads to be ordered, if not sequential? SET enable_indexscan=off (and maybe SET enable_seqscan=off and others as needed). Disabling index scan made it bitmap. It is surprising, but this increased read speed in

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> This seems significant..it means the heap was probably written in backwards order relative to the IDs, and the OS readahead is ineffective when index scanning across a range of IDs. But again, why is it different for one range and another? It was reversed for both ranges. > I would definitely

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 03:28:15PM -0700, Vladimir Ryabtsev wrote: > > it is not unusual to have 1GB cache or more... and do not forget to drop > the cache between tests + do a sync > I also reviewed import scripts and found the import was done in DESCENDING > order of IDs. This seems

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> You can create 2 partial indexes and the planner will pick it up for you. (and the planning time will go a bit up). Created two partial indexes and ensured planner uses it. But the result is still the same, no noticeable difference. > it is not unusual to have 1GB cache or more... and do not

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Fabio Pardi
Hi, answers (and questions) in line here below On 22/09/18 11:19, Vladimir Ryabtsev wrote: > > is the length of the text equally distributed over the 2 partitions? > Not 100% equally, but to me it does not seem to be a big deal... Considering > the ranges independently: > First range: ~70% < 10

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> Another idea is that the operating system rearranges I/O in a way that is not ideal for your storage. > Try a different I/O scheduler by running echo deadline > /sys/block/sda/queue/scheduler My scheduler was already "deadline". In some places I read that in virtual environment sometimes "noop"

Re: Why could different data in a table be processed with different performance?

2018-09-22 Thread didier
Hi, Assuming DB is quiescent. And if you run? select count(*) from articles where article_id between %s and %s ie without reading json, is your buffers hit count increasing? 20 000 8K blocks *2 is 500MB , should be in RAM after the first run. Fast: read=710 I/O Timings: read=852.547 ==> 1.3

Re: Why could different data in a table be processed with different performance?

2018-09-22 Thread Vladimir Ryabtsev
> is the length of the text equally distributed over the 2 partitions? Not 100% equally, but to me it does not seem to be a big deal... Considering the ranges independently: First range: ~70% < 10 KB, ~25% for 10-20 KB, ~3% for 20-30 KB, everything else is less than 1% (with 10 KB steps). Second

Re: Why could different data in a table be processed with different performance?

2018-09-21 Thread Fabio Pardi
On 09/21/2018 08:28 AM, Vladimir Ryabtsev wrote: >> but you say you observe a difference even after dropping the cache. > No, I say I see NO significant difference (accurate to measurement > error) between "with caches" and after dropping caches. And this is > explainable, I think. Since I

Re: Why could different data in a table be processed with different performance?

2018-09-21 Thread Fabio Pardi
Hi Vladimir, On 09/21/2018 02:07 AM, Vladimir Ryabtsev wrote: > > I have such a table: > > CREATE TABLE articles > ( >     article_id bigint NOT NULL, >     content jsonb NOT NULL, >     published_at timestamp without time zone NOT NULL, >     appended_at timestamp without time zone NOT

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Laurenz Albe
Vladimir Ryabtsev wrote: > explain (analyze, buffers) > select count(*), sum(length(content::text)) from articles where article_id > between %s and %s > > Sample output: > > Aggregate (cost=8635.91..8635.92 rows=1 width=16) (actual > time=6625.993..6625.995 rows=1 loops=1) > Buffers: shared

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Justin Pryzby
Sorry, dropped -performance. Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed for that matter) ? >>> Not sure what you mean... We created indexes on some fields (on >> I mean REINDEX INDEX articles_pkey; >> Or (from "contrib"): /usr/pgsql-10/bin/pg_repack -i

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Vladimir Ryabtsev
> Was the data populated differently, too ? Here is how new records were coming in last two month, by days: https://i.stack.imgur.com/zp9WP.png During a day, records come evenly (in both ranges), slightly faster in Europe and American work time. Since Jul 1, 2018, when we started population by

Re: Why could different data in a table be processed with different performance?

2018-09-20 Thread Justin Pryzby
On Thu, Sep 20, 2018 at 05:07:21PM -0700, Vladimir Ryabtsev wrote: > I am experiencing a strange performance problem when accessing JSONB > content by primary key. > I noticed that with some IDs it works pretty fast while with other it is > 4-5 times slower. It is suitable to note, there are two

Why could different data in a table be processed with different performance?

2018-09-20 Thread Vladimir Ryabtsev
I am experiencing a strange performance problem when accessing JSONB content by primary key. My DB version() is PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit postgres.conf: https://justpaste.it/6pzz1 uname -a: