Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
Move to Linux. :-) In our case, everything but the database servers were already Linux so it was an easy choice. Things have been rock solid since then. Once things get stuck, I don't think there is an alternative besides "stop -m immediate". However, since the problem is caused by an idle back

Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
That might be one cause (or it might otherwise exacerbate the problem), but it isn't the only cause. We weren't running anti-virus software and neither is Thomas. Unfortunately with the last go around, we collectively ran out of ideas before an underlying cause could be identified. Pete >>> Tom

Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
The same problem exists in 8.1 too. See this thread http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php Tom and Magnus tracked down a cause, but I don't think a fix was ever implemented. FWIW, we were bitten by the fsync problem which you noticed too. Unfortunately we were never ab

Re: [BUGS] [Win32] Problem with rename()

2006-06-17 Thread Peter Brant
>>> On 16.06.2006 at 23:21:21, in message <[EMAIL PROTECTED]>, Bruce Momjian wrote: > Yea. Where you using WAL archiving? We will have a fix in 8.1.5 to > prevent multiple archivers from starting. Perhaps that was a cause. > Not at the time, no. The rename in question was just a regular WAL s

Re: [BUGS] [Win32] Problem with rename()

2006-06-16 Thread Peter Brant
Really? If there was a patch, I missed it. My recollection is that there was general agreement about this particular problem (see, for example, http://archives.postgresql.org/pgsql-bugs/2006-04/msg00189.php ), but things kind of trailed off after that without a resolution. As far as the complete

Re: [BUGS] BUG #2371: database crashes with semctl failed

2006-05-02 Thread Peter Brant
Test server has SP1. This bug has only bit us twice (and never in a testing environment) so it's hard to say (from our experience). The successful pgbench runs are definitely good to see though. Pete >>> "Magnus Hagander" <[EMAIL PROTECTED]> 05/02/06 10:14 am >>> Great news. One question though

Re: [BUGS] BUG #2371: database crashes with semctl failed

2006-05-01 Thread Peter Brant
With the patch applied, I let an inhouse stress test run for several hours and it completed without incident. I also ran two runs of pgbench with 50 connections x 1000 transactions and one run of 50 connections x 5000 transactions. All completed successfully. (Test server is a dual Xeon with Hyp

Re: [BUGS] BUG #2371: database crashes with semctl failed

2006-04-25 Thread Peter Brant
Sure. I should note that we're moving to Linux for our production servers so our interest in the Windows port is waning (at least for the time being). In particular, the stuck WAL segment rename problem has occasionally been rather a pain in the neck. As long as we still have Windows test server

Re: [BUGS] [Win32] Problem with rename()

2006-04-21 Thread Peter Brant
This is probably somewhat superfluous, but we had another one these incidents last night whose details confirm your explanation here. [2006-04-21 00:22:19.500 ] 2452 LOG: could not rename file "pg_xlog/0001011A004C" to "pg_xlog/0001011A0071", continuing to try the autovac

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-19 Thread Peter Brant
I'm not sure that's the whole story. "Server #3" had backends with handles to the old relfilenode. It didn't have any fsync errors and the old relfilenode was apparently successfully deleted (or at least it wasn't visible in the file system anymore). That's the part of the morning's investigatio

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-19 Thread Peter Brant
Here's the evidence from this morning. I have to admit I'm not really sure what to make of it though. Pete The fsync / Permission denied errors occurred on 2 of 3 active servers for the 7 am CLUSTER cycle. Server #1 (with fsync errors): - Both old and new relfilenodes are still visible with a

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-18 Thread Peter Brant
It happens often enough and the episodes last long enough that grabbing a handle dump while this is going on should be easily done. Regarding the Win32 error code, backend/storage/file/fd.c calls _commit(). http://msdn2.microsoft.com/en-us/library/17618685(VS.80).aspx It looks like it is alread

Re: [BUGS] [Win32] Problem with rename()

2006-04-18 Thread Peter Brant
Does that also explain why an attempt to make a new connection just hangs? One other thing regarding that is that connection attempt seems to kinda, sorta succeed. It never makes it as far as a command prompt, but on the "stop -m immediate", psql does print the "HINT: In a moment you should be a

Re: [BUGS] [Win32] Problem with rename()

2006-04-18 Thread Peter Brant
It's definitely possible. Both failures occurred around the end of the business day as update traffic would have been coasting to a stop. The middle tier never closes a connection unless it's forced to (e.g. as a result of a query error, connection going away, etc.) Pete >>> Tom Lane <[EMAIL

Re: [BUGS] [Win32] Problem with rename()

2006-04-18 Thread Peter Brant
They are local. Pete >>> "Harald Armin Massa" <[EMAIL PROTECTED]> 04/18/06 4:35 pm >>> "G" - is that really a LOKAL drive at that server, or rather some NAS or similiar? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will igno

Re: [BUGS] [Win32] Problem with rename()

2006-04-18 Thread Peter Brant
Unfortunately, it's not that simple. It would be straightforward to track down if it were. In response to other questions: It's Postgres 8.1.3 running on Windows 2003 Server. No anti-virus software is installed. The servers are essentially bare except for the OS and Postgres. We have "handle

[BUGS] [Win32] Problem with rename()

2006-04-17 Thread Peter Brant
Hi all, In the last couple of days, we've been bitten (a couple of times, on different servers) by an apparent glitch or bad interaction in the Windows implementation of rename(). The relevant log message is: [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file "pg_xlog/0001010A00

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-17 Thread Peter Brant
The error messages refer to the old relfilenode (in 3 out of 3 occurrences today). Pete >>> Tom Lane <[EMAIL PROTECTED]> 04/14/06 2:41 am >>> OK ... but what's still unclear is whether the failures are occurring against the old relfilenode (the one just removed by the CLUSTER) or the new one just

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-14 Thread Peter Brant
Apparently we got lucky on all four servers with the latest cycle, so nothing to report. Load (both reading and writing) is quite light today so perhaps the bug is only triggered under a higher load. It seems the problem typically doesn't show up on weekends either (when load is also much lighter

Re: Permission denied on fsync / Win32 (was [BUGS] right

2006-04-13 Thread Peter Brant
The culprit is CLUSTER. There is a batch file which runs CLUSTER against six, relatively small (60k rows between them) tables at 7am, 1pm, and 9pm. Follows is the list of dates and hours when the "Permission denied" errors showed up. They match up to a tee (although the error apparently sometime

Re: [BUGS] right sibling is not next child

2006-04-13 Thread Peter Brant
Sounds good. There is nothing sensitive in DbTranImageStatus_pkey so if you decide you want it after all, it's there for the asking. Pete >>> Tom Lane <[EMAIL PROTECTED]> 04/13/06 3:30 am >>> Oh, never mind ... I've sussed it. ---(end of broadcast)--

Permission denied on fsync / Win32 (was [BUGS] right sibling is not next child)

2006-04-13 Thread Peter Brant
It turns out we've been getting rather huge numbers of "Permission denied" errors relating to fsync so perhaps it wasn't really a precursor to the crash as I'd previously thought. I've pasted in a complete list following this email covering the time span from 3/20 to 4/6. The number in the firs

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
The middle tier transaction log indicates this record was inserted into the county database at 2006-03-31 21:00:32.94. It would have hit the central databases sometime thereafter (more or less immediately if all was well). The Panel table contains some running statistics which are updated frequen

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
Per the DBAs, there hadn't been any recent crashes before last Thursday. A "vacuum analyze verbose" discovered the problem early Thursday morning. After the PANIC, the database never came back up (the heap_clean_redo: no block / full_page_writes = off problem). One thing that seems strange to me

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
I can't find any duplicates?!? The query select starelid, staattnum, ctid, xmin, xmax, cmin, cmax from pg_statistic p1 where (select count(*) from pg_statistic p2 where p1.starelid = p2.starelid and p1.staattnum = p2.staattnum) > 1 doesn't turn up anything. Nor does dumping select starelid,

Re: [BUGS] right sibling is not next child

2006-04-12 Thread Peter Brant
| S02 WA| R | Warrant issued| t | 10 | t | F | S04 (19 rows) bigbird=# >>> Tom Lane <[EMAIL PROTECTED]> 04/12/06 5:00 am >>> "Peter Brant" <[EMAIL PROTECTED]> writes: >

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
Also, when I tried to run a database-wide VACUUM ANALYZE VERBOSE it actually doesn't even get to Panel and errors out with: INFO: analyzing "public.MaintCode" INFO: "MaintCode": scanned 1 of 1 pages, containing 19 live rows and 0 dead rows; 19 rows in sample, 19 estimated total rows ERROR: dupl

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
The index data isn't sensitive, but I should ask for permission nonetheless. I'll send over the '-f' output tomorrow morning. Pete *** * PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1 * * File: 180571 * Options used:

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
2 = 635 (gdb) print rightsib No symbol "rightsib" in current context. (gdb) print nextoffset $3 = 87 (gdb) print leftsib $4 = 636 (gdb) print rightsib No symbol "rightsib" in current context. (gdb) continue Continuing. Program exited with code 03. (gdb) Pete >>> Tom

Re: [BUGS] right sibling is not next child

2006-04-11 Thread Peter Brant
Sorry about the delay in responding. We had a bit of difficulty with the test machine. Kevin is also on vacation this week. The problem is repeatable with a VACUUM. I've found the offending block. A (partial) pg_filedump of that block is pasted in below. I'm a little lost as to what the next

Re: [BUGS] BUG #2371: database crashes with semctl failed error

2006-04-10 Thread Peter Brant
Hi all, We were bitten by this same bug over the weekend (PG 8.1.3 / Windows Server 2003). The exact error was: FATAL: semctl(170688872, 6, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. The start of the errors corresponded to a nightly "vacuum analyze"