[GENERAL] Upgrade from 9.1 to 9.2 fails due to unlogged table?

2013-04-05 Thread Gordon Shannon
I have a 9.1.3 instance (Redhat 5) with some unlogged tables. I did the following steps: 1. pg_basebackup to create a base.tar 2. Used the base.tar plus the WALs required by the backup to restore the db to another 9.1.3 server. This went fine, except at the end of the recovery I got this error (f

Re: [GENERAL] Upgrade from 9.1 to 9.2 fails due to unlogged table?

2013-04-05 Thread Gordon Shannon
I repeated the entire process, and I have a few clarifications. When I said the db seemed fine after the restore, I was wrong. I could do a \d on an unlogged table, but when I selected count(*) from any, that resulted in an error like "could not open file "base/16388/15963587": No such file or

Re: [GENERAL] could not access status of transaction 1118722281

2011-04-09 Thread Gordon Shannon
Turns out this was most likely the pg_upgrade bug. In our case, I was able to dump and recreate the table in question. Since then, this has been made public: http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-no

[GENERAL] could not access status of transaction 1118722281

2011-04-04 Thread Gordon Shannon
Running 9.0.2 on Centos. I just discovered this in my production error log. Starting about 45 minutes ago, I got 70 of these, within 2 seconds: 28652 2011-04-04 21:47:29 EDT [33]WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "pg_toast_49338181" page 16820 These warnings were im

Re: [GENERAL] walreceiver getting bad data?

2011-01-06 Thread Gordon Shannon
It's 9.0.2 on Centos -- View this message in context: http://postgresql.1045698.n5.nabble.com/walreceiver-getting-bad-data-tp3329916p3330573.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make

[GENERAL] walreceiver getting bad data?

2011-01-05 Thread Gordon Shannon
I'm getting an error like this in the log a couple times a day on on my hot standby server. Any suggestions? 23964 2011-01-04 05:23:00 EST [47]LOG: invalid record length at 6E53/46E8A010 23535 2011-01-04 05:23:00 EST [2]FATAL: terminating walreceiver process due to administrator command c

Re: [GENERAL] seg fault crashed the postmaster

2011-01-04 Thread Gordon Shannon
I'm putting this on this thread, since it could be related to the issue. I'm now seeing this in the log on the HSB/SR server. It's happened about 4 times in the past 2 days. 23964 2011-01-04 05:23:00 EST [47]LOG: invalid record length at 6E53/46E8A010 23535 2011-01-04 05:23:00 EST [2]FATAL

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Unfortunately it's now impossible to say how many were updated, as they get deleted by another process later. I may be able to restore part of a dump from 2 days ago on another machine, and get some counts from that, assuming I have the disk space. I'll work on that. I do not believe there could

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
The number of matching rows on these queries is anything from 0 to 1. I don't think I can tell how many would have matched on the ones that crashed. Although I suspect it would have been toward the 1 end. I've been trying to get a reproducable test case with no luck so far. I assume y

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Maybe it doesn't work from gmail. I'll try uploading from here. http://postgresql.1045698.n5.nabble.com/file/n3323933/plan.txt plan.txt -- View this message in context: http://postgresql.1045698.n5.nabble.com/seg-fault-crashed-the-postmaster-tp3323117p3323933.html Sent from the PostgreSQL - g

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Yes that query does take 30 or 90 secs. I'm pretty sure it was blocking on its twin update running concurrently. However I'm not really sure how to identify what "transaction 1283585646" was. Enclosed is the query plan -- 21000 lines -gordon I tried to replicate the problem here without succes

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Sorry, I left that out. Yeah, I wondered that too, since these tables do not use toast. CREATE TYPE message_status_enum AS ENUM ( 'V', 'X', 'S', 'R', 'U', 'D' ); On Fri, Dec 31, 2010 at 12:38 PM, Tom Lane-2 [via PostgreSQL] < ml-node+3323859-1425181809-56...@n5.nabble.com > wrote: > Hmmm ...

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Here is the ddl for the tables in question. There are foreign keys to other tables that I omitted. http://postgresql.1045698.n5.nabble.com/file/n3323804/parts.sql parts.sql -- View this message in context: http://postgresql.1045698.n5.nabble.com/seg-fault-crashed-the-postmaster-tp3323117p3323

Re: [GENERAL] seg fault crashed the postmaster

2010-12-31 Thread Gordon Shannon
Interesting. That's exactly what we have been doing -- trying to update the same rows in multiple txns. For us to proceed in production, I will take steps to ensure we stop doing that, as it's just an app bug really. The table in question -- v_messages -- is an empty base table with 76 partitions

Re: [GENERAL] seg fault crashed the postmaster

2010-12-30 Thread Gordon Shannon
5f7d14 in PostgresMain (argc=, argv=, username=) at postgres.c:3929 #16 0x005c7ce5 in ServerLoop () at postmaster.c:3555 #17 0x005c89ec in PostmasterMain (argc=5, argv=0x1b31ea00) at postmaster.c:1092 #18 0x005725fe in main (argc=5, argv=) at main.c:188 On Thu, Dec 30, 2010 at 7:

Re: [GENERAL] seg fault crashed the postmaster

2010-12-30 Thread Gordon Shannon
I'd love to send you a stack trace. Any suggestions on how to get one? It has since happened again, on the same update command, so I'm guessing I can repeat it. On Thu, Dec 30, 2010 at 6:52 PM, Tom Lane-2 [via PostgreSQL] < ml-node+3323151-436577542-56...@n5.nabble.com >

[GENERAL] seg fault crashed the postmaster

2010-12-30 Thread Gordon Shannon
Hi, Running Centos, just upgraded our production db from 8.4.4 to 9.0.2 last night. About 20 hours later, an update statement seg faulted and crashed the server. This is a typical update that has worked fine for a long time. 20898 datafeed (58628) 2010-12-30 19:28:14 EST [103]LOG: process 2089

Re: [GENERAL] Need help understanding vacuum verbose output

2010-08-06 Thread Gordon Shannon
> That last message prints tups_vacuumed, but those other ones are counting > all the removed item pointers. So apparently Gordon had a whole lot of > pre-existing DEAD item pointers. I wonder why ... Perhaps this will help. Here's the entire test. Start with a newly loaded table with 5,063,4

Re: [GENERAL] Need help understanding vacuum verbose output

2010-08-06 Thread Gordon Shannon
Yes, and also from the original post: > 3 INFO: scanned index "authors_archive_pkey" to remove 45878 row > versions > 4 DETAIL: CPU 0.05s/0.34u sec elapsed 0.41 sec. > 5 INFO: "authors_archive": removed 45878 row versions in 396 pages > 6 DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. Lin

Re: [GENERAL] Need help understanding vacuum verbose output

2010-08-06 Thread Gordon Shannon
OK, so if it knew that all vacuumable tuples could be found in 492 pages, and it scanned only those pages, then how could it be that it reports 16558 removable tuples from those 492 pages, when it has already reported earlier that it removed 45878 tuples -- a number we know in fact to be correct?

[GENERAL] Need help understanding vacuum verbose output

2010-08-05 Thread Gordon Shannon
Hi, Running 8.4.4 on Centos. A couple of these numbers don't make sense to me. (I added line numbers for reference) 1 vacuum verbose authors_archive; 2 INFO: vacuuming "public.authors_archive" 3 INFO: scanned index "authors_archive_pkey" to remove 45878 row versions 4 DETAIL: CPU 0.05s/0

Re: [GENERAL] Help writing a query to predict auto analyze

2010-05-19 Thread Gordon Shannon
alvherre wrote: > > n_live_tup and n_dead_tup corresponds to the current numbers, > whereas "last analysis tuples" are the values from back when the > previous analyze ran. These counters keep moving per updates, deletes, > inserts, they are not static. > > OK. Do you know how can I get th

Re: [GENERAL] Help writing a query to predict auto analyze

2010-05-19 Thread Gordon Shannon
alvherre wrote: > > Excerpts from Gordon Shannon's message of mié may 19 11:49:45 -0400 2010: > >> at: last analysis tuples = pg_class.reltuples >> >> I'm the least confident about the last one -- tuples as of last analyze. >> Can anyone confirm or correct these? > > In 8.4 it's number

[GENERAL] Help writing a query to predict auto analyze

2010-05-19 Thread Gordon Shannon
In an effort to fine-tune my table storage parameters so tables are analyzed at the optimal time, I have written a query to show how soon my tables will be auto-analyzed. But my results to not jive with what I see autovacuum doing, i.e. there are tables that are millions of rows past the threshold

Re: [GENERAL] Crazy looking actual row count from explain analyze

2010-05-10 Thread Gordon Shannon
Tom Lane-2 wrote: > > My first suspicion > is that those are unvacuumed dead rows ... what's your vacuuming policy > on this database? > Ah, I didn't know that number included dead tuples. That probably explains it. pg_stat_user_tables says the table has 370,269 dead tuples. On this table,

[GENERAL] Crazy looking actual row count from explain analyze

2010-05-10 Thread Gordon Shannon
Running 8.4.3, I have a table with 43 million rows. Two of the columns are (topic_id int not null) and (status message_status_enum not null), where message_status_enum is defined as CREATE TYPE message_status_enum AS ENUM ( 'V', 'X', 'S', 'R', 'U', 'D' ); Among the indexes there is this: "m_2010

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
Sounds like you're on it. Just wanted to share one additional piece, in case it helps. Just before the ALTER INDEX SET TABLESPACE was issued, there were some writes to the table in question inside a serializable transaction. The transaction committed at 11:11:58 EDT, and consisted of, among a cou

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 12:52 PM, Tom Lane wrote: > Gordon Shannon writes: > > Bingo. Yes it is reasonable. It was 25 seconds between my altering the > > index in question and the server crash. > > Sounds like we have a smoking gun. Could you show all your non-defa

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 12:10 PM, Tom Lane wrote: > No, this would be a pg_database row with that OID. But it looks like > you found the relevant index anyway. > > Yup, realized that on second reading. > > These commands worked fine on the master, yet this seems suspiciously > > relevant. > > >

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 11:02 AM, Tom Lane wrote: > > > Hmm ... AFAICS the only way to get that message when the incoming TID's > offsetNumber is only 2 is for the index page to be completely empty > (not zeroes, else PageAddItem's sanity check would have triggered, > but valid and empty). What t

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
I just got ran into the same problem. Both servers are running 8.4.3, and the standby server had been running for 2 days, processing many thousands of logs successfully. Here's my error: 4158 2010-05-02 11:12:09 EDT [26445]LOG: restored log file "00013C7700C3" from archive 4158

Re: [GENERAL] unexplained autovacuum to prevent wraparound

2010-03-14 Thread Gordon Shannon
That looks like the fix for this, thanks! I will try to upgrade soon. -- Gordon On Sun, Mar 14, 2010 at 7:43 AM, Alvaro Herrera wrote: > Gordon Shannon escribió: > > Ah, now I see what you meant. Forgive me, I thought you were referring > to > > the pg_autovacuum table in

Re: [GENERAL] unexplained autovacuum to prevent wraparound

2010-03-13 Thread Gordon Shannon
ordon Shannon wrote: > This is 8.4, there is no pg_autovacuum table. I set it like this: > > alter table foo set (autovacuum_analyze_scale_factor=0.01); > > > > > On Fri, Mar 12, 2010 at 4:31 PM, Alvaro Herrera < > alvhe...@commandprompt.com> wrote: > >>

Re: [GENERAL] unexplained autovacuum to prevent wraparound

2010-03-12 Thread Gordon Shannon
Thanks, but I do want 1%. On Fri, Mar 12, 2010 at 5:19 PM, Joshua D. Drake wrote: > On Fri, 2010-03-12 at 16:45 -0700, Gordon Shannon wrote: > > This is 8.4, there is no pg_autovacuum table. I set it like this: > > > > alter table foo set (autovacuum_analyze_scale_factor

Re: [GENERAL] unexplained autovacuum to prevent wraparound

2010-03-12 Thread Gordon Shannon
This is 8.4, there is no pg_autovacuum table. I set it like this: alter table foo set (autovacuum_analyze_scale_factor=0.01); On Fri, Mar 12, 2010 at 4:31 PM, Alvaro Herrera wrote: > Gordon Shannon escribió: > > > One possibly interesting thing is that this seems to have started

[GENERAL] unexplained autovacuum to prevent wraparound

2010-03-12 Thread Gordon Shannon
It appears to me that in my 8.4.0 system, autovacuum is running to prevent wraparound contrary to the documentation. I have it set to a tables' relfrozenxid has to get to 1.5 billion before that kicks in: > show autovacuum_freeze_max_age; 15 > show vacuum_freeze_table_age; 13

[GENERAL] Got could not truncate directory "pg_multixact/offsets": apparent wraparound

2009-09-03 Thread Gordon Shannon
Hello, running 8.4 on Centos. Been running production for 6 months. Never saw this message until tonight: LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound In case it helps... Output of pg_controldata: Latest checkpoint's NextMultiXactId: 145725622 Latest check

Re: [GENERAL] Getting "insufficient data left in message" on copy with binary

2009-09-03 Thread Gordon Shannon
Tom Lane-2 wrote: > > Gordon Shannon writes: > >> ERROR: insufficient data left in message >> CONTEXT: COPY mytable, line 1, column provider_id > >> Anybody seen this? > > No. Can you extract a self-contained test case? > Got it. The pro

[GENERAL] Getting "insufficient data left in message" on copy with binary

2009-09-03 Thread Gordon Shannon
Hello, I'm running 8.4 on Linux/Centos. I am doing a "copy (select ) to '/absolute/path/to/file.dat' with binary". That works fine. But when I load that file into a table... copy mytable (id, mlid, parent_mlid, author_id, date_id, time_id, content_type_id, provider_id, is_duplica

Re: [GENERAL] How to capture an interactive psql session in a log file?

2009-04-02 Thread Gordon Shannon
That does the trick, awesome! I do think it would be great if psql had a "stderr" capture in addition to stdout. Thanks hubert depesz lubaczewski-2 wrote: > > On Thu, Apr 02, 2009 at 10:55:10PM -0700, Gordon Shannon wrote: >> Has anyone solved this issue before? >

[GENERAL] How to capture an interactive psql session in a log file?

2009-04-02 Thread Gordon Shannon
What I'm trying to do doesn't seem like it should be that difficult or unusual, but I can't seem to find the right combination of commands to make it happen. I want to have a log file that captures everything from an interactive psql session. Running 8.3.7 with bash shell on Linux. If I use \