Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-05-13 Thread Jan Wieck
Greg Stark wrote: Jan Wieck [EMAIL PROTECTED] writes: The whole sync() vs. fsync() discussion is in my opinion nonsense at this point. Without the ability to limit the amount of files to a reasonable number, by employing tablespaces in the form of larger container files, the risk of forcing

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-05-09 Thread Bruce Momjian
Jan Wieck wrote: Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the bgwriter and

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-16 Thread Bruce Momjian
Tom Lane wrote: The best idea I've heard so far is the one about sync() followed by a bunch of fsync()s. That seems to be correct, efficient, and dependent only on very-long-established Unix semantics. Agreed. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-15 Thread Florian Weimer
Tom Lane wrote: You can only fsync one FD at a time (too bad ... if there were a multi-file-fsync API it'd solve the overspecified-write-ordering issue). What about aio_fsync()? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-15 Thread Tom Lane
Florian Weimer [EMAIL PROTECTED] writes: Tom Lane wrote: You can only fsync one FD at a time (too bad ... if there were a multi-file-fsync API it'd solve the overspecified-write-ordering issue). What about aio_fsync()? (1) it's unportable; (2) it's not clear that it's any improvement over

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-09 Thread Jan Wieck
Bruce Momjian wrote: Jan Wieck wrote: Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-09 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes: The whole sync() vs. fsync() discussion is in my opinion nonsense at this point. The sync vs fsync discussion is not about performance, it is about correctness. You can't simply dismiss the fact that we don't know whether a checkpoint is really complete

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-09 Thread Greg Stark
Jan Wieck [EMAIL PROTECTED] writes: The whole sync() vs. fsync() discussion is in my opinion nonsense at this point. Without the ability to limit the amount of files to a reasonable number, by employing tablespaces in the form of larger container files, the risk of forcing excessive head

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-09 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes: Doing this is not just what you call it. In a system with let's say 500 active backends on a database with let's say 1000 things that are represented as a file, you'll need half a million virtual file descriptors. [shrug] We've been dealing with virtual

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-07 Thread Kevin Brown
I wrote: But that someplace else could easily be a process forked by the backend in question whose sole purpose is to go through the list of files generated by its parent backend and fsync() them. The backend can then go about its business and upon receipt of the SIGCHLD notify anyone that

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-07 Thread Merlin Moncure
Kevin Brown wrote: I have no idea whether or not this approach would work in Windows. The win32 API has ReadFileScatter/WriteFileScatter, which was developed to handle these types of problems. These two functions were added for the sole purpose of making SQL server run faster. They are always

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-06 Thread Jan Wieck
Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the bgwriter and hope that the kernel will see

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-06 Thread Kevin Brown
Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Well, running out of space in the list isn't that much of a problem. If the backends run out of list space (and the max size of the list could be a configurable thing, either as a percentage of shared memory or as an absolute size),

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-05 Thread Zeugswetter Andreas SB SD
I don't think the bgwriter is going to be able to keep up with I/O bound backends, but I do think it can scan and set those booleans fast enough for the backends to then perform the writes. As long as the bgwriter does not do sync writes (which it does not, since that would need a whole lot

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-05 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the bgwriter and hope that the kernel will see fit to write

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-05 Thread Shridhar Daithankar
On Thursday 05 February 2004 20:24, Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-05 Thread Tom Lane
Shridhar Daithankar [EMAIL PROTECTED] writes: There are other benefits of writing pages earlier even though they might not get synced immediately. Such as? It would tell kernel that this is latest copy of updated buffer. Kernel VFS should make that copy visible to every other backend as

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-05 Thread Zeugswetter Andreas SB SD
People keep saying that the bgwriter mustn't write pages synchronously because it'd be bad for performance, but I think that analysis is faulty. Performance of what --- the bgwriter? Nonsense, the *point* Imho that depends on the workload. For a normal OLTP workload this is certainly

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-04 Thread Bruce Momjian
I am concerned that the bgwriter will not be able to keep up with the I/O generated by even a single backend restoring a database, let alone a busy system. To me, the write() performed by the bgwriter, because it is I/O, will typically be the bottleneck on any system that is I/O bound

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-04 Thread Kevin Brown
Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Instead, have each backend maintain its own separate list in shared memory. The only readers of a given list would be the backend it belongs to and the bgwriter, and the only time bgwriter attempts to read the list is at checkpoint

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-04 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: The more finely you slice your workspace, the more likely it becomes that one particular part will run out of space. So the inefficient case where a backend isn't able to insert something into the appropriate list will become considerably

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-03 Thread Kevin Brown
Bruce Momjian wrote: Here is my new idea. (I will keep throwing out ideas until I hit on a good one.) The bgwriter it going to have to check before every write to determine if the file is already recorded as needing fsync during checkpoint. My idea is to have that checking happen during the

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-03 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Instead, have each backend maintain its own separate list in shared memory. The only readers of a given list would be the backend it belongs to and the bgwriter, and the only time bgwriter attempts to read the list is at checkpoint time. The sum total

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-02-01 Thread Bruce Momjian
Tom Lane wrote: What I've suggested before is that the bgwriter process can keep track of all files that it's written to since the last checkpoint, and fsync them during checkpoint (this would likely require giving the checkpoint task to the bgwriter instead of launching a separate process for

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-01-30 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: The trick is to somehow record all files modified since the last checkpoint, and open/fsync/close each one. My idea is to stat() each file in each directory and compare the modify time to determine if the file has been modified since the last

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-01-30 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: The trick is to somehow record all files modified since the last checkpoint, and open/fsync/close each one. My idea is to stat() each file in each directory and compare the modify time to determine if the file has been modified

Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint

2004-01-30 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Any ideas on how to record the modified files without generating tones of output or locking contention? What I've suggested before is that the bgwriter process can keep track of all files that it's written to since the last checkpoint, and fsync them