Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-07 Thread Zeugswetter Andreas SB SD
Keep in mind that we support platforms without O_DSYNC. I am not sure whether there are any that don't have O_SYNC either, but I am fairly sure that we measured O_SYNC to be slower than fsync()s on some platforms. This measurement is quite understandable, since the current software

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Antti Haapala
On 6 Oct 2002, Greg Copeland wrote: On Sat, 2002-10-05 at 14:46, Curtis Faith wrote: 2) aio_write vs. normal write. Since as you and others have pointed out aio_write and write are both asynchronous, the issue becomes one of whether or not the copies to the file system buffers

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Greg Copeland
On Mon, 2002-10-07 at 10:38, Antti Haapala wrote: Browsed web and came across this piece of text regarding a Linux-KAIO patch by Silicon Graphics... Ya, I have read this before. The problem here is that I'm not aware of which AIO implementation on Linux is the forerunner nor do I have any

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Neil Conway
Greg Copeland [EMAIL PROTECTED] writes: Ya, I have read this before. The problem here is that I'm not aware of which AIO implementation on Linux is the forerunner nor do I have any idea how it's implementation or performance details defer from that of other implementations on other

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Ken Hirsch
I sent this yesterday, but it seems not to have made it to the list... I have a couple of comments orthogonal to the present discussion. 1) It would be fairly easy to write log records over a network to a dedicated process on another system. If the other system has an uninterruptible

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-06 Thread Hannu Krosing
On Sun, 2002-10-06 at 04:03, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: Or its solution ;) as instead of the predicting we just write all data in log that is ready to be written. If we postpone writing, there will be hickups when we suddenly discover that we need to write a

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-06 Thread Greg Copeland
On Sat, 2002-10-05 at 14:46, Curtis Faith wrote: 2) aio_write vs. normal write. Since as you and others have pointed out aio_write and write are both asynchronous, the issue becomes one of whether or not the copies to the file system buffers happen synchronously or not. Actually, I

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-06 Thread Tom Lane
Greg Copeland [EMAIL PROTECTED] writes: I personally would at least like to see an aio implementation and would be willing to even help benchmark it to benchmark/validate any returns in performance. Surely if testing reflected a performance boost it would be considered for baseline

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-06 Thread Greg Copeland
On Sun, 2002-10-06 at 11:46, Tom Lane wrote: I can't personally get excited about something that only helps if your server is starved for RAM --- who runs servers that aren't fat on RAM anymore? But give it a shot if you like. Perhaps your analysis is pessimistic. I do suspect my analysis

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes: Assume Transaction A which writes a lot of buffers and XLog entries, so the Commit forces a relatively lengthy fsynch. Transactions B - E block not on the kernel lock from fsync but on the WALWriteLock. You are confusing WALWriteLock with

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Curtis Faith
You are confusing WALWriteLock with WALInsertLock. A transaction-committing flush operation only holds the former. XLogInsert only needs the latter --- at least as long as it doesn't need to write. Well that make things better than I thought. We still end up with a disk write for each

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
In particular, it would seriously degrade performance if the WAL file isn't on its own spindle but has to share bandwidth with data file access. If the OS is stupid I could see this happening. But if there are buffers and some sort of elevator algorithm the I/O won't happen at bad times. I

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Bruce Momjian
Curtis Faith wrote: Back-end servers would not issue fsync calls. They would simply block waiting until the LogWriter had written their record to the disk, i.e. until the sync'd block # was greater than the block that contained the XLOG_XACT_COMMIT record. The LogWriter could wake up

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Bruce Momjian
pgman wrote: Curtis Faith wrote: Back-end servers would not issue fsync calls. They would simply block waiting until the LogWriter had written their record to the disk, i.e. until the sync'd block # was greater than the block that contained the XLOG_XACT_COMMIT record. The LogWriter

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-05 Thread Hannu Krosing
Bruce Momjian kirjutas L, 05.10.2002 kell 13:49: Curtis Faith wrote: Back-end servers would not issue fsync calls. They would simply block waiting until the LogWriter had written their record to the disk, i.e. until the sync'd block # was greater than the block that contained the

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large PerformanceGain in WAL synching

2002-10-05 Thread Curtis Faith
Bruce Momjian wrote: So every backend is to going to wait around until its fsync gets done by the backend process? How is that a win? This is just another version of our GUC parameters: #commit_delay = 0 # range 0-10, in microseconds #commit_siblings =

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Doug McNaught
Tom Lane [EMAIL PROTECTED] writes: Curtis Faith [EMAIL PROTECTED] writes: The log file would be opened O_DSYNC, O_APPEND every time. Keep in mind that we support platforms without O_DSYNC. I am not sure whether there are any that don't have O_SYNC either, but I am fairly sure that we

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: The writer process should just issue a continuous stream of aio_write()'s while there are any waiters and keep track which waiters are safe to continue - thus no guessing of who's gonna commit. This recipe sounds like eat I/O bandwidth whether we need

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Bruce Momjian
Curtis Faith wrote: The advantage to aio_write in this scenario is when writes cross track boundaries or when the head is in the wrong spot. If we write in reasonable blocks with aio_write the write might get to the disk before the head passes the location for the write. Consider a

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
So, you are saying that we may get back aio confirmation quicker than if we issued our own write/fsync because the OS was able to slip our flush to disk in as part of someone else's or a general fsync? I don't buy that because it is possible our write() gets in as part of someone else's

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Bruce Momjian
Curtis Faith wrote: So, you are saying that we may get back aio confirmation quicker than if we issued our own write/fsync because the OS was able to slip our flush to disk in as part of someone else's or a general fsync? I don't buy that because it is possible our write() gets in as

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-05 Thread Hannu Krosing
On Sat, 2002-10-05 at 20:32, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: The writer process should just issue a continuous stream of aio_write()'s while there are any waiters and keep track which waiters are safe to continue - thus no guessing of who's gonna commit. This

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
No question about that! The sooner we can get stuff to the WAL buffers, the more likely we will get some other transaction to do our fsync work. Any ideas on how we can do that? More like the sooner we get stuff out of the WAL buffers and into the disk's buffers whether by write or

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Bruce Momjian
Curtis Faith wrote: No question about that! The sooner we can get stuff to the WAL buffers, the more likely we will get some other transaction to do our fsync work. Any ideas on how we can do that? More like the sooner we get stuff out of the WAL buffers and into the disk's buffers

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: Or its solution ;) as instead of the predicting we just write all data in log that is ready to be written. If we postpone writing, there will be hickups when we suddenly discover that we need to write a whole lot of pages (fsync()) after idling the disk

[HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
It appears the fsync problem is pervasive. Here's Linux 2.4.19's version from fs/buffer.c: lock- down(inode-i_sem); ret = filemap_fdatasync(inode-i_mapping); err = file-f_op-fsync(file, dentry, 1); if (err !ret) ret = err; err