Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Trond Myklebust
On Jan 14, 2014, at 10:39, Tom Lane wrote: > James Bottomley writes: >> The current mechanism for coherency between a userspace cache and the >> in-kernel page cache is mmap ... that's the only way you get the same >> page in both currently. > > Right. > >> glibc used to have an implementatio

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 15:39 +0100, Hannu Krosing wrote: > On 01/14/2014 09:39 AM, Claudio Freire wrote: > > On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing > > wrote: > >> Again, as said above the linux file system is doing fine. What we > >> want is a few ways to interact with it to let it do eve

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote: > On Jan 13, 2014, at 19:03, Hannu Krosing wrote: > > > On 01/13/2014 09:53 PM, Trond Myklebust wrote: > >> On Jan 13, 2014, at 15:40, Andres Freund wrote: > >> > >>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote: > On Mon, Jan 13

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Dave Chinner
On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: > On 2014-01-13 17:13:51 -0800, James Bottomley wrote: > > a file into a user provided buffer, thus obtaining a page cache entry > > and a copy in their userspace buffer, then insert the page of the user > > buffer back into the page ca

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Dave Chinner
On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote: > On 01/13/2014 02:26 PM, Mel Gorman wrote: > > Really? > > > > zone_reclaim_mode is often a complete disaster unless the workload is > > partitioned to fit within NUMA nodes. On older kernels enabling it would > > sometimes cause massiv

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Jan Kara
On Tue 14-01-14 11:11:28, Heikki Linnakangas wrote: > On 01/14/2014 12:26 AM, Mel Gorman wrote: > >On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote: > >>The other thing that comes to mind is the kernel's caching behavior. > >>We've talked a lot over the years about the difficulties of ge

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Jan Kara
On Tue 14-01-14 09:08:40, Hannu Krosing wrote: > >>> Effectively you end up with buffered read/write that's also mapped into > >>> the page cache. It's a pretty awful way to hack around mmap. > >> Well, the problem is that you can't really use mmap() for the things we > >> do. Postgres' durability

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Tom Lane
Trond Myklebust writes: > On Jan 14, 2014, at 10:39, Tom Lane wrote: >> "Don't be aggressive" isn't good enough. The prohibition on early write >> has to be absolute, because writing a dirty page before we've done >> whatever else we need to do results in a corrupt database. It has to >> be tre

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 12:42 PM, Trond Myklebust wrote: >> James Bottomley writes: >>> The current mechanism for coherency between a userspace cache and the >>> in-kernel page cache is mmap ... that's the only way you get the same >>> page in both currently. >> >> Right. >> >>> glibc used to hav

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Tom Lane
James Bottomley writes: > The current mechanism for coherency between a userspace cache and the > in-kernel page cache is mmap ... that's the only way you get the same > page in both currently. Right. > glibc used to have an implementation of read/write in terms of mmap, so > it should be possib

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 5:00 AM, Jan Kara wrote: > I thought that instead of injecting pages into pagecache for aging as you > describe in 3), you would mark pages as volatile (i.e. for reclaim by > kernel) through vrange() syscall. Next time you need the page, you check > whether the kernel recla

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 3:39 AM, Claudio Freire wrote: > On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing wrote: >> Again, as said above the linux file system is doing fine. What we >> want is a few ways to interact with it to let it do even better when >> working with postgresql by telling it some

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 11:39 AM, Hannu Krosing wrote: > On 01/14/2014 09:39 AM, Claudio Freire wrote: >> On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing wrote: >>> Again, as said above the linux file system is doing fine. What we >>> want is a few ways to interact with it to let it do even better

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Kevin Grittner
First off, I want to give a +1 on everything in the recent posts from Heikki and Hannu. Jan Kara wrote: > Now the aging of pages marked as volatile as it is currently > implemented needn't be perfect for your needs but you still have > time to influence what gets implemented... Actually develope

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Hannu Krosing
On 01/14/2014 09:39 AM, Claudio Freire wrote: > On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing wrote: >> Again, as said above the linux file system is doing fine. What we >> want is a few ways to interact with it to let it do even better when >> working with postgresql by telling it some stuff it

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing wrote: > Again, as said above the linux file system is doing fine. What we > want is a few ways to interact with it to let it do even better when > working with postgresql by telling it some stuff it otherwise would > have to second guess and by somet

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Hannu Krosing
On 01/14/2014 03:44 AM, Dave Chinner wrote: > On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: >> On 2014-01-13 17:13:51 -0800, James Bottomley wrote: >>> a file into a user provided buffer, thus obtaining a page cache entry >>> and a copy in their userspace buffer, then insert the pa

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Josh Berkus
On 01/13/2014 05:30 PM, Dave Chinner wrote: > On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote: > No matter what default NUMA allocation policy we set, there will be > an application for which that behaviour is wrong. As such, we've had > tools for setting application specific NUMA polic

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Andres Freund
On 2014-01-13 17:13:51 -0800, James Bottomley wrote: > a file into a user provided buffer, thus obtaining a page cache entry > and a copy in their userspace buffer, then insert the page of the user > buffer back into the page cache as the page cache page ... that's right, > isn't it postgress peopl

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Trond Myklebust
On Jan 13, 2014, at 19:03, Hannu Krosing wrote: > On 01/13/2014 09:53 PM, Trond Myklebust wrote: >> On Jan 13, 2014, at 15:40, Andres Freund wrote: >> >>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner wrote: > I notice, Josh, that yo

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Jan Kara
On Mon 13-01-14 22:36:06, Mel Gorman wrote: > On Mon, Jan 13, 2014 at 06:27:03PM -0200, Claudio Freire wrote: > > On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby wrote: > > > On 1/13/14, 2:19 PM, Claudio Freire wrote: > > >> > > >> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas > > >> wrote: > > >>> > >

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Theodore Ts'o
The issue with O_DIRECT is actually a much more general issue --- namely, database programmers that for various reasons decide they don't want to go down the O_DIRECT route, but then care about performance. PostgreSQL is not the only database which is had this issue. There are two papers at this

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Trond Myklebust
On Jan 13, 2014, at 16:03, Robert Haas wrote: > On Mon, Jan 13, 2014 at 3:53 PM, Trond Myklebust wrote: >> O_DIRECT was specifically designed to solve the problem of double buffering >> between applications and the kernel. Why are you not able to use that in >> these situations? > > O_DIRECT

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Jan Kara
On Mon 13-01-14 22:26:45, Mel Gorman wrote: > The flipside is also meant to hold true. If you know data will be needed > in the near future then posix_fadvise(POSIX_FADV_WILLNEED). Glancing at > the implementation it does a forced read-ahead on the range of pages of > interest. It doesn't look like

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley
On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote: > On 2014-01-13 12:34:35 -0800, James Bottomley wrote: > > On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: > > > Well, if we were to collaborate with the kernel community on this then > > > presumably we can do better than that for evictio

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Jim Nasby
On 1/13/14, 4:47 PM, Jan Kara wrote: Note to postgres guys: I think you should have a look at the proposed 'vrange' system call. The latest posting is here: http://www.spinics.net/lists/linux-mm/msg67328.html. It contains a rather detailed description of the feature. And if the feature looks good

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Jim Nasby
On 1/13/14, 4:44 PM, Andres Freund wrote: > > One major usecase is transplanting a page comming from postgres' > >buffers into the kernel's buffercache because the latter has a much > >better chance of properly allocating system resources across independent > >applications running. > >If you wa

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Hannu Krosing
On 01/13/2014 09:53 PM, Trond Myklebust wrote: > On Jan 13, 2014, at 15:40, Andres Freund wrote: > >> On 2014-01-13 15:15:16 -0500, Robert Haas wrote: >>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner wrote: I notice, Josh, that you didn't mention the problems many people have run int

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Mel Gorman
On Mon, Jan 13, 2014 at 11:38:44PM +0100, Jan Kara wrote: > On Mon 13-01-14 22:26:45, Mel Gorman wrote: > > The flipside is also meant to hold true. If you know data will be needed > > in the near future then posix_fadvise(POSIX_FADV_WILLNEED). Glancing at > > the implementation it does a forced re

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Andres Freund
On 2014-01-13 14:19:56 -0800, James Bottomley wrote: > > Frequently mmap()/madvise()/munmap()ing 8kb chunks has > > horrible consequences for performance/scalability - very quickly you > > contend on locks in the kernel. > > Is this because of problems in the mmap_sem? It's been a while since I

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Andres Freund
On 2014-01-13 12:34:35 -0800, James Bottomley wrote: > On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: > > Well, if we were to collaborate with the kernel community on this then > > presumably we can do better than that for eviction... even to the > > extent of "here's some data from this range

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Trond Myklebust
On Jan 13, 2014, at 15:40, Andres Freund wrote: > On 2014-01-13 15:15:16 -0500, Robert Haas wrote: >> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner wrote: >>> I notice, Josh, that you didn't mention the problems many people >>> have run into with Transparent Huge Page defrag and with NUMA >>>

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley
On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: > On 1/13/14, 2:27 PM, Claudio Freire wrote: > > On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby wrote: > >> On 1/13/14, 2:19 PM, Claudio Freire wrote: > >>> > >>> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas > >>> wrote: > > On a related n

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Robert Haas
On Mon, Jan 13, 2014 at 3:53 PM, Trond Myklebust wrote: > O_DIRECT was specifically designed to solve the problem of double buffering > between applications and the kernel. Why are you not able to use that in > these situations? O_DIRECT was apparently designed by a deranged monkey on some seri

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread Andres Freund
On 2014-01-13 15:53:36 -0500, Trond Myklebust wrote: > > I've wondered before if there wouldn't be a chance for postgres to say > > "my dear OS, that the file range 0-8192 of file x contains y, no need to > > reread" and do that when we evict a page from s_b but I never dared to > > actually propos

<    1   2