Manfred Spraul <[EMAIL PROTECTED]> writes:
> The checkpoint code uses sync() right now. Actually sync();sleep(2);sync().
> Win32 has no sync() call, therefore it will use fsyncs. Perhaps platforms with
> deferred errors on close must use fsync, too. Hopefully parallel fsyncs -
> sequential fsyncs
Randolf Richardson <[EMAIL PROTECTED]> writes:
> "[EMAIL PROTECTED] (Greg Stark)" stated in
> comp.databases.postgresql.hackers:
>> The traditional Unix filesystems certainly don't return errors at close.
> Why shouldn't the close() function return an error? If an invalid
> file handle wa
"[EMAIL PROTECTED] (Greg Stark)" stated in
comp.databases.postgresql.hackers:
> Tom Lane <[EMAIL PROTECTED]> writes:
>
>> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> > FreeBSD 4.7/4.9 and the UFS filesystem
>>
>> Hm, okay, I'm pretty sure that that combination wouldn't report ENOSPC
Greg Stark wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
That means
open();
write();
sync();
could succeed, but the data is not stored on disk, correct?
That would be true on any filesystem. Unless you throw an fsync() call in.
The checkpoint code uses sync() right now. Ac
Manfred Spraul <[EMAIL PROTECTED]> writes:
> That means
> open();
> write();
> sync();
>
> could succeed, but the data is not stored on disk, correct?
That would be true on any filesystem. Unless you throw an fsync() call in.
With sync replaced by fsync then any filesystem ought to
-On [20040125 03:52], Tom Lane ([EMAIL PROTECTED]) wrote:
>Hm, okay, I'm pretty sure that that combination wouldn't report ENOSPC
>at close().
>From Tru64's write(2):
[ENOSPC]
[XSH4.2] No free space is left on the file system containing the
file.
[Tru64 UNIX] An attempt was ma
Christoph Haller <[EMAIL PROTECTED]> writes:
> Tom was referring to close(), not fclose().
> I once had an awful time searching for a memory leak caused
> by a typo using close instead of fclose.
> So adding checks for both is probably a good idea.
Already done.
regard
>
> Tom Lane wrote:
> > I said:
> > > If there wasn't disk space enough to hold the clog page, the checkpoint
> > > attempt should have failed. So it may be that allowing a short read in
> > > slru.c would be patching the symptom of a bug that is really elsewhere.
> >
> > After more staring at t
Tom Lane wrote:
Okay ... Chris was kind enough to let me examine the WAL logs and
postmaster stderr log for his recent problem, and I believe that
I have now achieved a full understanding of what happened. The true
bug is indeed somewhere else than slru.c, and we would not have found
it if slru.c
Just for the record, the Canaveral you are thinking about is derived
from the spanish word "Cañaveral", which is a place where "cañas" grow
(canes or stems, according to my dictionary -- some sort of vegetal
living form anyway). I suppose Cape Kennedy was filled with those
plants and that's what t
On Mon, Jan 26, 2004 at 02:52:58PM +0900, Michael Glaesemann wrote:
> I don't know if the 'canaveral' prompt had anything to do with it
> (maybe it was just the subject line), but I kept thinking of shuttle
> disasters, o-rings, and plane crashes reading through this. I won't
> claim to underst
Excellent analysis. Thanks. Are there any other cases like this?
---
Tom Lane wrote:
> Okay ... Chris was kind enough to let me examine the WAL logs and
> postmaster stderr log for his recent problem, and I believe that
>
Tom Lane wrote:
> I said:
> > If there wasn't disk space enough to hold the clog page, the checkpoint
> > attempt should have failed. So it may be that allowing a short read in
> > slru.c would be patching the symptom of a bug that is really elsewhere.
>
> After more staring at the code, I have a
Awesome Tom :)
I'm glad I happened to have all the data required on hand to fully analyze
the problem. Let's hope this make this failure condition go away for all
future postgresql users :)
Chris
On Mon, 26 Jan 2004, Tom Lane wrote:
> Okay ... Chris was kind enough to let me examine the WAL l
Tom,
I don't know if the 'canaveral' prompt had anything to do with it
(maybe it was just the subject line), but I kept thinking of shuttle
disasters, o-rings, and plane crashes reading through this. I won't
claim to understand everything in huge detail, but from this newbie's
point of view, w
Okay ... Chris was kind enough to let me examine the WAL logs and
postmaster stderr log for his recent problem, and I believe that
I have now achieved a full understanding of what happened. The true
bug is indeed somewhere else than slru.c, and we would not have found
it if slru.c had had less-par
Greg Stark wrote:
I do know that AFS returns quota failures on close. This was unusual enough
that when AFS was deployed at school unix tools failed left and right over
precisely this issue. Though it mostly just meant they returned the wrong exit
status.
That means
open();
write();
sync(
> That request to look at your WAL files is still open ...
I've sent you it privately - let me know how it goes.
Chris
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Tom Lane <[EMAIL PROTECTED]> writes:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > FreeBSD 4.7/4.9 and the UFS filesystem
>
> Hm, okay, I'm pretty sure that that combination wouldn't report ENOSPC
> at close(). We need to fix the code to check close's return value,
> probably, but it
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> FreeBSD 4.7/4.9 and the UFS filesystem
Hm, okay, I'm pretty sure that that combination wouldn't report ENOSPC
at close(). We need to fix the code to check close's return value,
probably, but it seems we still lack a clear explanation of what
h
After more staring at the code, I have a theory. SlruPhysicalWritePage
and SlruPhysicalReadPage are coded on the assumption that close() can
never return any interesting failure. However, it now occurs to me that
there are some filesystem implementations wherein ENOSPC could be
returned at close(
I said:
> If there wasn't disk space enough to hold the clog page, the checkpoint
> attempt should have failed. So it may be that allowing a short read in
> slru.c would be patching the symptom of a bug that is really elsewhere.
After more staring at the code, I have a theory. SlruPhysicalWriteP
Gavin Sherry <[EMAIL PROTECTED]> writes:
> It seems that by adding the following to SlruPhysicalReadPage() we can
> recover in a reasonable way here. Instead of:
> [ add non-error check to lseek() ]
But it's not the lseek() that's gonna fail. What we'll actually see,
and did see in Chris' report,
On Fri, 23 Jan 2004, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom's answer will be undoubtly better ...
>
> Nope, I think you got all the relevant points.
>
> The only thing I'd add after having had more time to think about it is
> that this seems very much like the problem
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom's answer will be undoubtly better ...
Nope, I think you got all the relevant points.
The only thing I'd add after having had more time to think about it is
that this seems very much like the problem we noticed recently with
recovery-from-WAL being
Tom Lane wrote:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > Are you interested in real backtraces, any of the old data directory,
> > etc. to debug the problem?
>
> If you could recompile with debug support and get a backtrace from the
> panic, it would be helpful. I suspect what w
On Fri, Jan 23, 2004 at 04:21:04PM -0500, Tom Lane wrote:
> But the clog access code evidently got confused by being asked to read
> a page that didn't exist in the file. I'm not sure yet how that
> sequence of events occurred, which is why I asked Chris for a stack
> trace.
There was a very sim
Rod Taylor <[EMAIL PROTECTED]> writes:
> Granted, running out of diskspace is a bad idea, but can (has?)
> something be put into place to prevent manual intervention from being
> required in restarting the database?
See subsequent discussion. I do want to modify the code to avoid this
problem in
On Fri, Jan 23, 2004 at 05:58:33PM -0300, Martín Marqués wrote:
> Tom, could you give a small insight on what occurred here, why those 8k of zeros
> fixed it, and what is a "WAL replay"?
If I may ...
- the disk filled up
- Postgres registered something in WAL that required some commit status
(
=?iso-8859-1?b?TWFydO1uIA==?= =?iso-8859-1?b?TWFycXXpcw==?= <[EMAIL PROTECTED]> writes:
> Tom, could you give a small insight on what occurred here, why those
> 8k of zeros fixed it, and what is a "WAL replay"?
I think what happened is that there was insufficient space to write out
a new page of t
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> Are you interested in real backtraces, any of the old data directory,
> etc. to debug the problem?
If you could recompile with debug support and get a backtrace from the
panic, it would be helpful. I suspect what we need to do is make the
clo
On Fri, 2004-01-23 at 16:00, Tom Lane wrote:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > Now I can start it up! Thanks!
>
> > What should I do now?
>
> Go home and get some sleep ;-). If the WAL replay succeeded, you're up
> and running, nothing else to do.
Granted, running out o
> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 23, 2004 1:01 PM
> To: Christopher Kings-Lynne
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Disaster!
>
>
> Christopher Kings-Lynne <[EMAIL PROTECTED]>
What should I do now?
Go home and get some sleep ;-). If the WAL replay succeeded, you're up
and running, nothing else to do.
Cool, thanks heaps Tom.
Are you interested in real backtraces, any of the old data directory,
etc. to debug the problem?
Obviously it ran out of disk space, but surely
Mensaje citado por Tom Lane <[EMAIL PROTECTED]>:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > Now I can start it up! Thanks!
>
> > What should I do now?
>
> Go home and get some sleep ;-). If the WAL replay succeeded, you're up
> and running, nothing else to do.
Tom, could you gi
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> Now I can start it up! Thanks!
> What should I do now?
Go home and get some sleep ;-). If the WAL replay succeeded, you're up
and running, nothing else to do.
regards, tom lane
---(end of bro
Mensaje citado por Christopher Kings-Lynne <[EMAIL PROTECTED]>:
> > I'd suggest extending that file with 8K of zeroes (might need more than
> > that, but probably not).
>
> How do I do that? Sorry - I'm not sure of the quickest way, and I'm
> reading man pages as we speak!
# dd if=/dev/zeros o
I'd suggest extending that file with 8K of zeroes (might need more than
that, but probably not).
OK, I've done
dd if=/dev/zero of=zeros count=16
Then cat zero >> 000D
Now I can start it up! Thanks!
What should I do now?
Chris
---(end of broadcast)
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> I'd suggest extending that file with 8K of zeroes (might need more than
>> that, but probably not).
> How do I do that? Sorry - I'm not sure of the quickest way, and I'm
> reading man pages as we speak!
Something like "dd if=/dev/zero bs=8k
I'd suggest extending that file with 8K of zeroes (might need more than
that, but probably not).
How do I do that? Sorry - I'm not sure of the quickest way, and I'm
reading man pages as we speak!
Thanks Tom,
Chris
---(end of broadcast)---
TIP 4:
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> We ran out of disk space on our main server, and now I've freed up
> space, we cannot start postgres!
> Jan 23 12:18:51 canaveral postgres[563]: [7-1] PANIC: could not access
> status of transaction 14286850
> Jan 23 12:18:51 canaveral postg
> -Original Message-
> From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 23, 2004 12:29 PM
> To: PostgreSQL-development
> Cc: Tom Lane
> Subject: [HACKERS] Disaster!
>
>
> We ran out of disk space on our main server, and now
pg_clog information:
# cd pg_clog
# ls -al
total 3602
drwx-- 2 pgsql pgsql 512 Jan 23 03:49 .
drwx-- 6 pgsql pgsql 512 Jan 23 12:30 ..
-rw--- 1 pgsql pgsql 262144 Jan 18 19:43
-rw--- 1 pgsql pgsql 262144 Jan 18 19:43 0001
-rw--- 1 pgsql pgsql 262144 Ja
We ran out of disk space on our main server, and now I've freed up
space, we cannot start postgres!
Jan 23 12:18:50 canaveral postgres[563]: [2-1] LOG: checkpoint record
is at 2/96500B94
Jan 23 12:18:50 canaveral postgres[563]: [3-1] LOG: redo record is at
2/964BD23C; undo record is at 0/0; s
44 matches
Mail list logo