Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Jordan Henderson
Personally, I think it is useful to have features. I quite understand the difficulties in maintaining some features however. Also having worked on internals for commercial DB engines, I have specifically how code/data paths can be shortened. I would not make the choice for someone to be force

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy
> "Jordan" == Jordan Henderson <[EMAIL PROTECTED]> writes: Jordan> significantly better results. I would not say it requires Jordan> considerable tuning, but an understanding of data, storage Jordan> and access patterns. Additionally, these features did not Jordan> cause our

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Dann Corbit
> -Original Message- > From: Jordan Henderson [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 30, 2003 4:31 PM > To: [EMAIL PROTECTED]; Doug McNaught > Cc: Christopher Kings-Lynne; PostgreSQL-development > Subject: Re: [HACKERS] O_DIRECT in freebsd > > My ex

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Jordan Henderson
My experience with DB2 showed that properly setup DMS tablespaces provided a significant performance benefit. I have also seen that the average DBA does not generally understand the data or access patterns in the database. Given that, they don't correctly setup table spaces in general, filesys

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy
DB2 supports cooked and raw file systems - SMS (System Manged Space) and DMS (Database Managed Space) tablespaces. The DB2 experience is that DMS tends to outperform SMS but requires considerable tuning and administrative overhead to see these wins. -- Pip-pip Sailesh http://www.cs.berkeley.e

Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Manfred Spraul
Greg Stark wrote: Manfred Spraul <[EMAIL PROTECTED]> writes: One problem for WAL is that O_DIRECT would disable the write cache - each operation would block until the data arrived on disk, and that might block other backends that try to access WALWriteLock. Perhaps a dedicated backend that doe

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Greg Stark
Manfred Spraul <[EMAIL PROTECTED]> writes: > One problem for WAL is that O_DIRECT would disable the write cache - > each operation would block until the data arrived on disk, and that might block > other backends that try to access WALWriteLock. > Perhaps a dedicated backend that does the writeba

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Manfred Spraul
Tom Lane wrote: Not for WAL --- we never read the WAL at all in normal operation. (If it works for writes, then we would want to use it for writing WAL, but that's not apparent from what Christopher quoted.) At least under Linux, it works for writes. Oracle uses O_DIRECT to access (both read and

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Tom Lane
Doug McNaught <[EMAIL PROTECTED]> writes: > Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: >> A new DIRECTIO kernel option enables support for read operations that >> bypass the buffer cache and put data directly into a userland >> buffer. This feature requires that the O_DIRECT flag is set on

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Doug McNaught
"scott.marlowe" <[EMAIL PROTECTED]> writes: > I would think the biggest savings could come from using directIO for > vacuuming, so it doesn't cause the kernel to flush buffers. > > Would that be just as hard to implement? Two words: "cache coherency". -Doug ---(end o

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread scott.marlowe
On 29 Oct 2003, Doug McNaught wrote: > Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: > > > FreeBSD 4.9 was released today. In the release notes was: > > > > 2.2.6 File Systems > > > > A new DIRECTIO kernel option enables support for read operations that > > bypass the buffer cache and pu

Re: [HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Doug McNaught
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: > FreeBSD 4.9 was released today. In the release notes was: > > 2.2.6 File Systems > > A new DIRECTIO kernel option enables support for read operations that > bypass the buffer cache and put data directly into a userland > buffer. This feature

[HACKERS] O_DIRECT in freebsd

2003-10-29 Thread Christopher Kings-Lynne
FreeBSD 4.9 was released today. In the release notes was: 2.2.6 File Systems A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file descriptor a

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: > > > Nor could it ever be a win unless the cache was populated via > > > O_DIRECT, actually. Big PG cache == 2 extra copies of data, once > > > in the kernel and once in PG. Doing caching at the kernel level, > > > however means only one copy of data (for the most part).

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > Nor could it ever be a win unless the cache was populated via > > O_DIRECT, actually. Big PG cache == 2 extra copies of data, once > > in the kernel and once in PG. Doing caching at the kernel level, > > however means only one copy of data (for the most part). Only > > problem with this bein

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> >> it doesn't seem totally out of the question. I'd kinda like to > >> see some experimental evidence that it's worth doing though. > >> Anyone care to make a quick-hack prototype and do some > >> measurements? > > > What would you like to measure? Overall system performance when a > > query i

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: > Nor could it ever be a win unless the cache was populated via > O_DIRECT, actually. Big PG cache == 2 extra copies of data, once in > the kernel and once in PG. Doing caching at the kernel level, however > means only one copy of data (for the most part). Only problem wit

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > True, it is a cost/benefit issue. My assumption was that once we have > > free-behind in the PostgreSQL shared buffer cache, the kernel cache > > issues would be minimal, but I am willing to be found wrong. > > If you are running on the > small-shared-buffers-and-large-kernel-cache theory, th

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes: >> it doesn't seem totally out of the question. I'd kinda like to see >> some experimental evidence that it's worth doing though. Anyone >> care to make a quick-hack prototype and do some measurements? > What would you like to measure? Overall system

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > True, it is a cost/benefit issue. My assumption was that once we have > free-behind in the PostgreSQL shared buffer cache, the kernel cache > issues would be minimal, but I am willing to be found wrong. If you are running on the small-shared-buffers-and

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > > What about cache coherency problems with other backends not > > > opening with O_DIRECT? > > > > If O_DIRECT introduces cache coherency problems against other > > processes not using O_DIRECT then the whole idea is a nonstarter, > > but I can't imagine any kernel hackers would have been stup

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > Basically, I think we need free-behind rather than O_DIRECT. > > There are two separate issues here --- one is what's happening in > our own cache, and one is what's happening in the kernel disk cache. > Implementing our own free-behind code would help in our own cache > but does nothing for t

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > _That_ is an excellent point. However, do we know at the time we open > > the file descriptor if we will be doing this? > > We'd have to say on a per-read basis whether we want O_DIRECT or not, > and fd.c would need to provide a suit

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Basically, I think we need free-behind rather than O_DIRECT. > > There are two separate issues here --- one is what's happening in our > own cache, and one is what's happening in the kernel disk cache. > Implementing our own free-behi

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > _That_ is an excellent point. However, do we know at the time we open > the file descriptor if we will be doing this? We'd have to say on a per-read basis whether we want O_DIRECT or not, and fd.c would need to provide a suitable file descriptor. > Wha

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Basically, I think we need free-behind rather than O_DIRECT. There are two separate issues here --- one is what's happening in our own cache, and one is what's happening in the kernel disk cache. Implementing our own free-behind code would help in our ow

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > > Basically, we don't know when we read a buffer whether this is a > > > read-only or read/write. In fact, we could read it in, and > > > another backend could write it for us. > > > > Um, wait. The cache is shared between backends? I don't think > > so, but it shouldn't matter because ther

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: > > Basically, we don't know when we read a buffer whether this is a > > read-only or read/write. In fact, we could read it in, and another > > backend could write it for us. > > Um, wait. The cache is shared between backends? I don't think so, > but it shouldn't matter b

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> Basically, we don't know when we read a buffer whether this is a > read-only or read/write. In fact, we could read it in, and another > backend could write it for us. Um, wait. The cache is shared between backends? I don't think so, but it shouldn't matter because there has to be a semaphore

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Basically, we don't know when we read a buffer whether this is a read-only or read/write. In fact, we could read it in, and another backend could write it for us. The big issue is that when we do a write, we don't wait for it to get to disk. It seems to use O_DIRECT, we would have to read the b

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > > What you really want is Solaris's free-behind, where it detects > > > if a scan is exceeding a certain percentage of the OS cache and > > > moves the pages to the _front_ of the to-be-reused list. I am > > > not sure what other OS's support this, but we need this on our > > > own buffer mana

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
Sean Chittenden wrote: > > What you really want is Solaris's free-behind, where it detects if a > > scan is exceeding a certain percentage of the OS cache and moves the > > pages to the _front_ of the to-be-reused list. I am not sure what > > other OS's support this, but we need this on our own bu

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> What you really want is Solaris's free-behind, where it detects if a > scan is exceeding a certain percentage of the OS cache and moves the > pages to the _front_ of the to-be-reused list. I am not sure what > other OS's support this, but we need this on our own buffer manager > code as well. >

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Bruce Momjian
What you really want is Solaris's free-behind, where it detects if a scan is exceeding a certain percentage of the OS cache and moves the pages to the _front_ of the to-be-reused list. I am not sure what other OS's support this, but we need this on our own buffer manager code as well. Our TODO a

Re: [HACKERS] O_DIRECT in freebsd

2003-06-22 Thread Sean Chittenden
> > > > The reason I mention it is that Postgres already supports > > > > O_DIRECT I think on some other platforms (for whatever > > > > reason). > > > > > > [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the > > > source tree is in TODO: > > > > > > * Consider use of open/fcn

Re: [HACKERS] O_DIRECT in freebsd

2003-06-20 Thread Jim C. Nasby
On Wed, Jun 18, 2003 at 10:01:37AM +1000, Gavin Sherry wrote: > On Tue, 17 Jun 2003, Tom Lane wrote: > > > "Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: > > > The reason I mention it is that Postgres already supports O_DIRECT I think > > > on some other platforms (for whatever reason). > >

Re: [HACKERS] O_DIRECT in freebsd

2003-06-18 Thread Bruce Momjian
Also, keep in mind writes to O_DIRECT devices have to wait for the data to get on the platters rather than into the kernel cache. --- Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > >> DB2 and Oracle, from mem

Re: [HACKERS] O_DIRECT in freebsd

2003-06-18 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: >> DB2 and Oracle, from memory, allow users to pass hints to the planner to >> use/not use file system caching. > Might it make sense to do this for on-disk sorts, since sort_mem is > essentially being used as a disk cache (at least for reads)? If sort_

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Jim C. Nasby
On Wed, Jun 18, 2003 at 10:01:37AM +1000, Gavin Sherry wrote: > On Tue, 17 Jun 2003, Tom Lane wrote: > > * Consider use of open/fcntl(O_DIRECT) to minimize OS caching > > > > I personally disagree with this TODO item for the same reason Sean > > cited: Postgres is designed and tuned to rely on OS-

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Gavin Sherry
On Tue, 17 Jun 2003, Tom Lane wrote: > "Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: > > The reason I mention it is that Postgres already supports O_DIRECT I think > > on some other platforms (for whatever reason). > > [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Tom Lane
"Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: > The reason I mention it is that Postgres already supports O_DIRECT I think > on some other platforms (for whatever reason). [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the source tree is in TODO: * Consider use of open

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Curt Sampson
On Tue, 17 Jun 2003, Christopher Kings-Lynne wrote: > "A new DIRECTIO kernel option enables support for read operations that > bypass the buffer cache and put data directly into a userland buffer > > Will PostgreSQL pick this up automatically, or do we need to add extra > checks? You don't wa

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Christopher Kings-Lynne
> > Will PostgreSQL pick this up automatically, or do we need to add > > extra checks? > > Extra checks, though I'm not sure why you'd want this. This is the > equiv of a nice way of handling raw IO for read only > operations... which would be bad. Call me crazy, but unless you're on The reason

Re: [HACKERS] O_DIRECT in freebsd

2003-06-17 Thread Sean Chittenden
> I noticed this in the FreeBSD 5.1 release notes: > > "A new DIRECTIO kernel option enables support for read operations > that bypass the buffer cache and put data directly into a userland > buffer. This feature requires that the O_DIRECT flag is set on the > file descriptor and that both the off

[HACKERS] O_DIRECT in freebsd

2003-06-16 Thread Christopher Kings-Lynne
I noticed this in the FreeBSD 5.1 release notes: "A new DIRECTIO kernel option enables support for read operations that bypass the buffer cache and put data directly into a userland buffer. This feature requires that the O_DIRECT flag is set on the file descriptor and that both the offset and leng