> -----Original Message-----
> From: Mikheev, Vadim [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, April 04, 2001 3:37 AM
> To: 'Tom Lane'
> Subject: RE: [BUGS] Loosing files after backend crash 
> 
> 
> 1. Indices could be recreated with REINDEX or pg_class could 
> be queried
> with seq scan (something like where relname like 
> '%seq_i___data_buffer%')...
> Konstantin?

Well, bad news. After a few more tries to crash the backend, the whole
postmaster crashed and didn't rise back.
It fails to start up reporting "Apr  4 18:53:05 wale postgres[71618]: [9]
FATAL 2:  XLogWrite: write request is past end of log" to syslog.
And the last line of errlog sounds like "/usr/local/pgsql/bin/postmaster:
Startup proc 72905 exited with status 512 - abort"
I wanted to ask, if I need to re-initdb or there are some other ways to fix
the problem?
If I need to re-init, can I preserve the database in it's current state, to
continue my investigation from the point I was interrupted?

I hope I will be able to answer your questions after I heal the postmaster.

> 3. Could you help us reproduce this bug, Konstantin?
> What exactly did you do after sequence creation?

Here's the script which creates the sequence and the temp table:

---------------------8<---------------------
begin transaction;
drop sequence _seq_i___data_buffer;
create sequence _seq_i___data_buffer;

CREATE TEMPORARY TABLE __data_buffer (
   buff_id int4 UNIQUE NOT NULL default NEXTVAL( '_seq_i___data_buffer' ),
   rule_id int4 NOT NULL,
   _value decimal( 18, 0) NOT NULL,
   _count decimal( 18, 0) NOT NULL,
   value_time timestamp NOT NULL
  );

insert into __data_buffer
(buff_id     , rule_id     , _value                 , _count
, value_time                  )
(
... [689 UNION'd selects]
);
commit;
---------------------8<---------------------
After that I run my function.

Shall I send you it's code? (It's 23 Kbytes big).

> Does your function reference temp table you've mentioned?
Yes. And it also creates three more temporary tables (Let's call them A, B
and C).
Actually, temp table 'A' is populated with the values from the external
temptable ('__data_buffer').
And the 'B' temp table is populated with a query from temptable A.
Then table C is populated with a huge query, which joins many tables and
table 'B' among them.
But tables 'A' and the external one are not referenced in this huge query.
Well, this very query crashes the postmaster.
Our team is playing with this query to locate the reasons for this failure.
We will report you the results of our investigation.

If you want to have a look at the query, here it is.
'__vars_info' is what I referenced as temptable C.
'__rule_data_with_tis' is what I referenced as temptable 'B'. 
All other tables are not temporary.
int1 is my self-written type. To prevent from blaming my type I can say,
that replacing it with, say, float gives the same results.
datediff() is my own function (MS SQL analog, works well alone).

---------------------8<---------------------
  INSERT INTO __vars_info ( var_id, min_old_ti, max_old_ti, lifetime,
timeinterval )
   ( SELECT variable.var_id,
      CASE
       WHEN year = 1 THEN MIN( datediff(''year'', basetime,  timebegin))
       WHEN month = 1 THEN MIN( datediff(''month'', basetime,  timebegin))
       WHEN week = 1 THEN MIN( datediff(''week'', basetime,  timebegin))
       WHEN day = 1 THEN MIN( datediff(''day'', basetime,  timebegin))
       WHEN hour = 1 THEN MIN( datediff(''hour'', basetime,  timebegin))
       WHEN five_minute = 1 THEN MIN( datediff(''minute'', basetime,
timebegin) / 5)
      END as min_old_ti,
      CASE
       WHEN year = 1 THEN MAX( datediff(''year'', basetime,  timebegin))
       WHEN month = 1 THEN MAX( datediff(''month'', basetime,  timebegin))
       WHEN week = 1 THEN MAX( datediff(''week'', basetime,  timebegin))
       WHEN day = 1 THEN MAX( datediff(''day'', basetime,  timebegin))
       WHEN hour = 1 THEN MAX( datediff(''hour'', basetime,  timebegin))
       WHEN five_minute = 1 THEN MAX( datediff(''minute'', basetime,
timebegin) / 5)
      END as max_old_ti,
      lifetime, timeinterval
     FROM
      variable
       LEFT JOIN
      ( SELECT var_id, timebegin FROM var_value WHERE var_id IN
       ( SELECT DISTINCT var_id FROM variable WHERE vset_id IN
        ( SELECT DISTINCT vset_id FROM vset_to_rule WHERE rule_id IN
         ( SELECT DISTINCT rule_id FROM __rule_data_with_tis ) ) ) )
      AS sel_var_value
       ON variable.var_id = sel_var_value.var_id,
      ( SELECT ti_id, (year)::int1 AS year, (month)::int1 AS month,
         (week)::int1 AS week, (day)::int1 AS day,
         (hour)::int1 AS hour, (five_minute)::int1 AS five_minute
        FROM timeinterval )
      AS timeinterval
     WHERE
      variable.ti_id = timeinterval.ti_id AND
      variable.isactive = 1 AND variable.vset_id IN
       ( SELECT DISTINCT vset_id FROM vset_to_rule
         WHERE vset_id IN ( SELECT DISTINCT vset_id FROM variable_set where
isactive = 1 ) AND
               rule_id IN ( SELECT DISTINCT rule_id FROM
__rule_data_with_tis ) )
     GROUP BY variable.var_id, lifetime, timeinterval, year::int4,
month::int4, week::int4, day::int4, hour::int4, five_minute::int4)
     ;
---------------------8<---------------------

> What cause crash? Maybe crash is related somehow...
See above.

> Could you try to reproduce failure with wal_debug = 1 and
> post me postmaster' log?
I'll do it after I succeed in bringing the postmaster up.

Regards, 
Konstantin Solodovnikov.

P.S. Almost forgot:
Here's what psql tells about our sequence:

---------------------8<---------------------
Netflow_Test=# \d seq_i___data_buffer
                                  Table "seq_i___data_buffer"
 Attribute  |           Type           |                       Modifier
------------+--------------------------+------------------------------------
-------------------
 buff_id    | integer                  | not null default
nextval('seq_i___data_buffer'::text)
 ^^^^^^^
 buff_id    | integer                  |
 ^^^^^^^
 rule_id    | integer                  | not null
 _value     | numeric(18,0)            | not null
 _count     | numeric(18,0)            | not null
 value_time | timestamp with time zone | not null
Index: pg_temp.96430.1
       ^^^^^^^^^^^^^
---------------------8<---------------------

It shows almost the same structure as __data_buffer should have.
Except for the fact, that it has 'buff_id' doubled :)

K.S.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to