Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-24 Thread Sam R.
Hi! "Index in memory" topic: After read operation starts, I think / it seems that a big part of an index gets loaded to memory quite quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading operation starts, pages start to be loaded to memory quickly. So, this "feature" / PostgreSQ

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Gasper Zejn
Hi, Vladimir, Reading the whole thread it seems you should look deeper into IO subsystem. 1) Which file system are you using? 2) What is the segment layout of the LVM PVs and LVs? See https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/report_object_selection.html how to check. If

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> but I'd be curious to know > SELECT schemaname, tablename, attname, correlation FROM pg_stats WHERE tablename='articles' AND column='article_id' LIMIT 1; I think you meant 'attname'. It gives storage articles article_id -0.77380306 Vlad

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 05:59:12PM -0700, Vladimir Ryabtsev wrote: > > This seems significant..it means the heap was probably written in > backwards > order relative to the IDs, and the OS readahead is ineffective when index > scanning across a range of IDs. > But again, why is it different for one

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> If it does an index scan, I think that will badly fail to keep the same order of heap TIDs - it'll be inserting rows in ID order rather than in (I guess) reverse ID order. According to the plan, it's gonna be seq. scan with filter. Vlad

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 05:59:12PM -0700, Vladimir Ryabtsev wrote: > > I *suspect* VACUUM FULL won't help, since (AIUI) it copies all "visible" ... > I am going copy the slow range into a table nearby and see if it reproduces > (I hope "INSERT INTO t2 SELECT * FROM t1 WHERE ..." will keep existing

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> did you try either 1) forcing a bitmap scan (of only one index), to force the heap reads to be ordered, if not sequential? SET enable_indexscan=off (and maybe SET enable_seqscan=off and others as needed). Disabling index scan made it bitmap. It is surprising, but this increased read speed in bot

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> This seems significant..it means the heap was probably written in backwards order relative to the IDs, and the OS readahead is ineffective when index scanning across a range of IDs. But again, why is it different for one range and another? It was reversed for both ranges. > I would definitely wa

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> And if you run? > select count(*) from articles where article_id between %s and %s > ie without reading json, is your buffers hit count increasing? Tried this. This is somewhat interesting, too... Even index-only scan is faster for the "fast" range. The results are consistently fast in it, with s

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 03:28:15PM -0700, Vladimir Ryabtsev wrote: > > it is not unusual to have 1GB cache or more... and do not forget to drop > the cache between tests + do a sync > I also reviewed import scripts and found the import was done in DESCENDING > order of IDs. This seems significant

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> You can create 2 partial indexes and the planner will pick it up for you. (and the planning time will go a bit up). Created two partial indexes and ensured planner uses it. But the result is still the same, no noticeable difference. > it is not unusual to have 1GB cache or more... and do not fo

Re: Explain is slow with tables having many columns

2018-09-24 Thread Andres Freund
Hi, On 2018-09-24 12:43:44 -0700, legrand legrand wrote: > Justin Pryzby wrote > > On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: > >> Hello, > >> I have found that explain on tables with many (hundreds) columns > >> are slow compare to nominal executions. > > > > See also this

Re: Explain is slow with tables having many columns

2018-09-24 Thread Andres Freund
Hi, (CCing -hackers) On 2018-09-24 12:22:28 -0700, legrand legrand wrote: > I have found that explain on tables with many (hundreds) columns > are slow compare to nominal executions. Yea, colname_is_unique() (called via make_colname_unique()) is essentially O(#total_columns) and rougly called on

Re: Explain is slow with tables having many columns

2018-09-24 Thread legrand legrand
Justin Pryzby wrote > On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: >> Hello, >> I have found that explain on tables with many (hundreds) columns >> are slow compare to nominal executions. > > See also this thread from last month: > > https://www.postgresql.org/message-id/flat/

Re: Explain is slow with tables having many columns

2018-09-24 Thread Justin Pryzby
On Mon, Sep 24, 2018 at 12:22:28PM -0700, legrand legrand wrote: > Hello, > I have found that explain on tables with many (hundreds) columns > are slow compare to nominal executions. See also this thread from last month: https://www.postgresql.org/message-id/flat/CAEe%3DmRnNNL3RDKJDmY%3D_mpcpAb5u

Explain is slow with tables having many columns

2018-09-24 Thread legrand legrand
Hello, I have found that explain on tables with many (hundreds) columns are slow compare to nominal executions. This can break application performances when using auto_explain or pg_store_plans. Here is my test case (with 500 columns, can be pushed to 1000 or 1600) create table a(); DECLARE i i

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Fabio Pardi
Hi, answers (and questions) in line here below On 22/09/18 11:19, Vladimir Ryabtsev wrote: > > is the length of the text equally distributed over the 2 partitions? > Not 100% equally, but to me it does not seem to be a big deal... Considering > the ranges independently: > First range: ~70% < 10

Re: Why could different data in a table be processed with different performance?

2018-09-24 Thread Vladimir Ryabtsev
> Another idea is that the operating system rearranges I/O in a way that is not ideal for your storage. > Try a different I/O scheduler by running echo deadline > /sys/block/sda/queue/scheduler My scheduler was already "deadline". In some places I read that in virtual environment sometimes "noop"