Re: [HACKERS] O_DIRECT support for Windows

2007-03-28 Thread Magnus Hagander
On Wed, Mar 28, 2007 at 02:47:12PM +0900, ITAGAKI Takahiro wrote: Magnus Hagander [EMAIL PROTECTED] wrote: IIRC, we're still waiting for performance numbers showing there exists a win from this patch. Here is a performance number of Direct I/O support on Windows. There was 10%+ of

Re: [HACKERS] O_DIRECT support for Windows

2007-03-27 Thread ITAGAKI Takahiro
Magnus Hagander [EMAIL PROTECTED] wrote: IIRC, we're still waiting for performance numbers showing there exists a win from this patch. Here is a performance number of Direct I/O support on Windows. There was 10%+ of performance win on pgbench (263.33 vs. 290.79) in O_DIRECT. However, I only

Re: [HACKERS] O_DIRECT, or madvise and/or posix_fadvise

2007-01-12 Thread Martijn van Oosterhout
On Thu, Jan 11, 2007 at 02:35:13PM -0800, [EMAIL PROTECTED] wrote: I caught this thread about O_DIRECT on kerneltrap.org: http://kerneltrap.org/node/7563 It sounds like there is much to be gained here in terms of reducing the number of user/kernel space copies in the operating system. I

Re: [HACKERS] O_DIRECT, or madvise and/or posix_fadvise

2007-01-12 Thread markwkm
On 1/12/07, Martijn van Oosterhout kleptog@svana.org wrote: On Thu, Jan 11, 2007 at 02:35:13PM -0800, [EMAIL PROTECTED] wrote: I caught this thread about O_DIRECT on kerneltrap.org: http://kerneltrap.org/node/7563 It sounds like there is much to be gained here in terms of reducing the

[HACKERS] O_DIRECT, or madvise and/or posix_fadvise

2007-01-11 Thread markwkm
I caught this thread about O_DIRECT on kerneltrap.org: http://kerneltrap.org/node/7563 It sounds like there is much to be gained here in terms of reducing the number of user/kernel space copies in the operating system. I got the impression that posix_fadvise in the Linux kernel isn't as good

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Manfred Spraul
Greg Stark wrote: Manfred Spraul [EMAIL PROTECTED] writes: One problem for WAL is that O_DIRECT would disable the write cache - each operation would block until the data arrived on disk, and that might block other backends that try to access WALWriteLock. Perhaps a dedicated backend that does

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy
DB2 supports cooked and raw file systems - SMS (System Manged Space) and DMS (Database Managed Space) tablespaces. The DB2 experience is that DMS tends to outperform SMS but requires considerable tuning and administrative overhead to see these wins. -- Pip-pip Sailesh

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Jordan Henderson
My experience with DB2 showed that properly setup DMS tablespaces provided a significant performance benefit. I have also seen that the average DBA does not generally understand the data or access patterns in the database. Given that, they don't correctly setup table spaces in general,

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Dann Corbit
-Original Message- From: Jordan Henderson [mailto:[EMAIL PROTECTED] Sent: Thursday, October 30, 2003 4:31 PM To: [EMAIL PROTECTED]; Doug McNaught Cc: Christopher Kings-Lynne; PostgreSQL-development Subject: Re: [HACKERS] O_DIRECT in freebsd My experience with DB2 showed

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy
Jordan == Jordan Henderson [EMAIL PROTECTED] writes: Jordan significantly better results. I would not say it requires Jordan considerable tuning, but an understanding of data, storage Jordan and access patterns. Additionally, these features did not Jordan cause our group

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Jordan Henderson
Personally, I think it is useful to have features. I quite understand the difficulties in maintaining some features however. Also having worked on internals for commercial DB engines, I have specifically how code/data paths can be shortened. I would not make the choice for someone to be

[HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Christopher Kings-Lynne
FreeBSD 4.9 was released today. In the release notes was: 2.2.6 File Systems A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file descriptor

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Doug McNaught
Christopher Kings-Lynne [EMAIL PROTECTED] writes: FreeBSD 4.9 was released today. In the release notes was: 2.2.6 File Systems A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread scott.marlowe
On 29 Oct 2003, Doug McNaught wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: FreeBSD 4.9 was released today. In the release notes was: 2.2.6 File Systems A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Doug McNaught
scott.marlowe [EMAIL PROTECTED] writes: I would think the biggest savings could come from using directIO for vacuuming, so it doesn't cause the kernel to flush buffers. Would that be just as hard to implement? Two words: cache coherency. -Doug ---(end of

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Tom Lane
Doug McNaught [EMAIL PROTECTED] writes: Christopher Kings-Lynne [EMAIL PROTECTED] writes: A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Manfred Spraul
Tom Lane wrote: Not for WAL --- we never read the WAL at all in normal operation. (If it works for writes, then we would want to use it for writing WAL, but that's not apparent from what Christopher quoted.) At least under Linux, it works for writes. Oracle uses O_DIRECT to access (both read

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Greg Stark
Manfred Spraul [EMAIL PROTECTED] writes: One problem for WAL is that O_DIRECT would disable the write cache - each operation would block until the data arrived on disk, and that might block other backends that try to access WALWriteLock. Perhaps a dedicated backend that does the writeback

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
The reason I mention it is that Postgres already supports O_DIRECT I think on some other platforms (for whatever reason). [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the source tree is in TODO: * Consider use of open/fcntl(O_DIRECT) to minimize

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer manager code as well. Our TODO

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer manager code as well. Our

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer manager code as

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Basically, we don't know when we read a buffer whether this is a read-only or read/write. In fact, we could read it in, and another backend could write it for us. The big issue is that when we do a write, we don't wait for it to get to disk. It seems to use O_DIRECT, we would have to read the

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
Basically, we don't know when we read a buffer whether this is a read-only or read/write. In fact, we could read it in, and another backend could write it for us. Um, wait. The cache is shared between backends? I don't think so, but it shouldn't matter because there has to be a semaphore

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: Basically, we don't know when we read a buffer whether this is a read-only or read/write. In fact, we could read it in, and another backend could write it for us. Um, wait. The cache is shared between backends? I don't think so, but it shouldn't matter because

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Basically, I think we need free-behind rather than O_DIRECT. There are two separate issues here --- one is what's happening in our own cache, and one is what's happening in the kernel disk cache. Implementing our own free-behind code would help in our own

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: _That_ is an excellent point. However, do we know at the time we open the file descriptor if we will be doing this? We'd have to say on a per-read basis whether we want O_DIRECT or not, and fd.c would need to provide a suitable file descriptor. What

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Basically, I think we need free-behind rather than O_DIRECT. There are two separate issues here --- one is what's happening in our own cache, and one is what's happening in the kernel disk cache. Implementing our own free-behind code

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: _That_ is an excellent point. However, do we know at the time we open the file descriptor if we will be doing this? We'd have to say on a per-read basis whether we want O_DIRECT or not, and fd.c would need to provide a suitable file

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
Basically, I think we need free-behind rather than O_DIRECT. There are two separate issues here --- one is what's happening in our own cache, and one is what's happening in the kernel disk cache. Implementing our own free-behind code would help in our own cache but does nothing for the

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
What about cache coherency problems with other backends not opening with O_DIRECT? If O_DIRECT introduces cache coherency problems against other processes not using O_DIRECT then the whole idea is a nonstarter, but I can't imagine any kernel hackers would have been stupid enough

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: True, it is a cost/benefit issue. My assumption was that once we have free-behind in the PostgreSQL shared buffer cache, the kernel cache issues would be minimal, but I am willing to be found wrong. If you are running on the

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes: it doesn't seem totally out of the question. I'd kinda like to see some experimental evidence that it's worth doing though. Anyone care to make a quick-hack prototype and do some measurements? What would you like to measure? Overall system

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
True, it is a cost/benefit issue. My assumption was that once we have free-behind in the PostgreSQL shared buffer cache, the kernel cache issues would be minimal, but I am willing to be found wrong. If you are running on the small-shared-buffers-and-large-kernel-cache theory, then

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: Nor could it ever be a win unless the cache was populated via O_DIRECT, actually. Big PG cache == 2 extra copies of data, once in the kernel and once in PG. Doing caching at the kernel level, however means only one copy of data (for the most part). Only problem with

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
it doesn't seem totally out of the question. I'd kinda like to see some experimental evidence that it's worth doing though. Anyone care to make a quick-hack prototype and do some measurements? What would you like to measure? Overall system performance when a query is using O_DIRECT

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
Nor could it ever be a win unless the cache was populated via O_DIRECT, actually. Big PG cache == 2 extra copies of data, once in the kernel and once in PG. Doing caching at the kernel level, however means only one copy of data (for the most part). Only problem with this being that

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: Nor could it ever be a win unless the cache was populated via O_DIRECT, actually. Big PG cache == 2 extra copies of data, once in the kernel and once in PG. Doing caching at the kernel level, however means only one copy of data (for the most part). Only

Re: [HACKERS] O_DIRECT in freebsd

2003-06-20 Thread Jim C. Nasby
On Wed, Jun 18, 2003 at 10:01:37AM +1000, Gavin Sherry wrote: On Tue, 17 Jun 2003, Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: The reason I mention it is that Postgres already supports O_DIRECT I think on some other platforms (for whatever reason). [ sounds of

Re: [HACKERS] O_DIRECT in freebsd

2003-06-18 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: DB2 and Oracle, from memory, allow users to pass hints to the planner to use/not use file system caching. Might it make sense to do this for on-disk sorts, since sort_mem is essentially being used as a disk cache (at least for reads)? If sort_mem were

Re: [HACKERS] O_DIRECT in freebsd

2003-06-18 Thread Bruce Momjian
Also, keep in mind writes to O_DIRECT devices have to wait for the data to get on the platters rather than into the kernel cache. --- Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: DB2 and Oracle, from memory,

[HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Christopher Kings-Lynne
I noticed this in the FreeBSD 5.1 release notes: A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file descriptor and that both the offset and

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Sean Chittenden
I noticed this in the FreeBSD 5.1 release notes: A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file descriptor and that both the offset

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Christopher Kings-Lynne
Will PostgreSQL pick this up automatically, or do we need to add extra checks? Extra checks, though I'm not sure why you'd want this. This is the equiv of a nice way of handling raw IO for read only operations... which would be bad. Call me crazy, but unless you're on The reason I

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Curt Sampson
On Tue, 17 Jun 2003, Christopher Kings-Lynne wrote: A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer Will PostgreSQL pick this up automatically, or do we need to add extra checks? You don't want it

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes: The reason I mention it is that Postgres already supports O_DIRECT I think on some other platforms (for whatever reason). [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the source tree is in TODO: * Consider use of

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Gavin Sherry
On Tue, 17 Jun 2003, Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: The reason I mention it is that Postgres already supports O_DIRECT I think on some other platforms (for whatever reason). [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the source

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Jim C. Nasby
On Wed, Jun 18, 2003 at 10:01:37AM +1000, Gavin Sherry wrote: On Tue, 17 Jun 2003, Tom Lane wrote: * Consider use of open/fcntl(O_DIRECT) to minimize OS caching I personally disagree with this TODO item for the same reason Sean cited: Postgres is designed and tuned to rely on OS-level

Re: [HACKERS] O_DIRECT

2001-10-01 Thread Zeugswetter Andreas SB SD
The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this flag for open files will attempt to minimize the cache effects of reading and writing. I wonder if using this for WAL would be good. Not before the code is not optimized to write more than the current 8k to the

Re: [HACKERS] O_DIRECT

2001-09-29 Thread Bruce Momjian
The O_DIRECT flag has been added in FreeBSD 4.4 (386 Alpha) also. From the release notes: Kernel Changes The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this flag for open files will attempt to minimize the cache effects of reading and writing. I wonder if using

Re: [HACKERS] O_DIRECT and performance

2001-09-29 Thread Bruce Momjian
Well, O_DIRECT has finally made it into the Linux kernel. It lets you open a file in such a way that reads and writes don't go to the buffer cache but straight to the disk. Accesses must be aligned on filesystem block boundaries. Is there any case where PG would benefit from this? I

[HACKERS] O_DIRECT

2001-09-25 Thread Christopher Kings-Lynne
The O_DIRECT flag has been added in FreeBSD 4.4 (386 Alpha) also. From the release notes: Kernel Changes The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this flag for open files will attempt to minimize the cache effects of reading and writing. See:

[HACKERS] O_DIRECT and performance

2001-09-25 Thread Doug McNaught
Well, O_DIRECT has finally made it into the Linux kernel. It lets you open a file in such a way that reads and writes don't go to the buffer cache but straight to the disk. Accesses must be aligned on filesystem block boundaries. Is there any case where PG would benefit from this? I can see