Re: [Veritas-ha] SUMMARY: filesystem corruption after the cluster nodereboot

Jim Senicka Wed, 01 Apr 2009 05:23:03 -0700

Running a non journeled file system in a cluster is always a bad idea,
as your recovery time is always effected by file system start up tasks.
Running UFS in logging mode was usually a pretty big performance hit.
Why not VxFS?

-----Original Message-----
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of
Aleksandr Nepomnyashchiy
Sent: Tuesday, March 31, 2009 6:07 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SUMMARY: filesystem corruption after the cluster
nodereboot

Many thanks to Tom Stephens for his help in troubleshooting.

What happened :
Both fs1 and fs2 became corrupted after the node crash. Most probably
VCS tried to FSCK both and was successful with fs1 (size ~4G) and
didn't complete within the timeout period on fs2 (size ~100G). So,
fsck of fs2 was killed and didn't leave anything in the engine_A.log

Suggested actions:
A) Implement UFS logging on both fs1 and fs2 - should eliminate the
file system corruption and the need for FSCK (I will definitely
implement this).
B) Increase the "OnlineTimeout" value for the "Mount" type from the
default of 300 seconds  (this should be considered carefully, can
cause troubles).

PS I was considering adding "-y" in FsckOpt but it doesn't make any
difference - online script adds "-y" option to fsck (regardless of
whether you specify it ot not in the FsckOpt). This is the case for
online script version 2.9 from 02/13/01 18:15:47.

========================================================================
========
=======================       Please see the original post below
=====================
========================================================================
========

Dear VCS gurus,
Please help me to understand why only 1 out of 2 mount points came up
after the carsh.

I can see in the log that fs1 was fsck-ed by VCS and brought online.
Was fsck even attempted on fs2? And if not why?

VCS is 2.0, both fs1 and fs2 are "ufs",  nothing in FsckOpt.

============== engine_A.log from the healthy node =========
TAG_E 2009/03/26 18:25:55 (node_d) VCS:13001:Resource(mnt_fs1): Output
of the completed operation (online)
mount: the state of /dev/vx/dsk/mydg/fs1 is not okay
       and it was attempted to be mounted read/write
mount: Please run fsck and try again
** /dev/vx/rdsk/mydg/fs1
** Last Mounted on /mount/fs1
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups

FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX?  yes

7324 files, 2158506 used, 1773622 free (4910 frags, 221089 blocks,
0.1% fragmentation)
TAG_E 2009/03/26 18:25:55 VCS:10298:Resource mnt_fs1 (Owner: unknown,
Group: srvgrA) is online on node_d (VCS initiated)
TAG_E 2009/03/26 18:30:07 (node_d) VCS:13003:Resource(mnt_fs2): Output
of the timedout operation (online)
mount: the state of /dev/vx/dsk/mydg/fs2 is not okay
       and it was attempted to be mounted read/write
mount: Please run fsck and try again
TAG_B 2009/03/26 18:30:07 (node_d) VCS:13012:Resource(mnt_fs2): online
procedure did not complete within the expected time.
TAG_D 2009/03/26 18:30:07 (node_d) VCS:13065:Agent is calling clean
for resource(mnt_fs2) because online did not complete within the
expected time.

Thank you,
Aleksandr
_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SUMMARY: filesystem corruption after the cluster nodereboot

Reply via email to