[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Andrew Deason
On Fri, 28 Jan 2011 12:10:38 -0500 Jeff Blaine wrote: > The last time we brought our fileservers down (cleanly, according to > "shutdown" info via bos status), it struck me as odd that salvages > were needed once it came up. I sort of brushed it off. As in, it salvaged everything automatically

[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Andrew Deason
On Fri, 28 Jan 2011 13:17:31 -0500 Jeff Blaine wrote: > Examples from FileLog.old: > > Fri Jan 28 10:02:48 2011 VAttachVolume: volume /vicepf/V2023864046.vol > needs to be salvaged; not attached. This just says that the fileserver didn't clear the "I'm using this volume" flag in the header; we

[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Andrew Deason
On Fri, 28 Jan 2011 13:52:02 -0500 Derrick Brashear wrote: > did shutdown perchance take 30min? BosLog would still indicate a force kill after 30 mins. What are all of the BosLog entries mentioning the fileserver? (assuming bosserver hasn't been restarted enough times to rotate that away) -- A

[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Andrew Deason
On Mon, 31 Jan 2011 11:54:24 -0500 Steve Simmons wrote: > I haven't read the code, but by observing the logfiles during a > shutdown time it appears that fs shutdown break callbacks in a > single-threaded manner per partition. This could probably be > parallelized; simple thought experiments say

[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Andrew Deason
On Mon, 31 Jan 2011 11:54:24 -0500 Steve Simmons wrote: > > Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 > > Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 > > Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 > > Wed Jan 26 12:58:19 2011: bos shutdown: fileserver faile

[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-02-01 Thread Andrew Deason
On Tue, 01 Feb 2011 12:04:08 -0800 Patricia O'Reilly wrote: > From what you have described it sounds to me like you need the patch > that Andrew referenced earlier that allows you to configure an > -offline-timeout and -offline-shutdown-timeout option on your > fileservers. We have has similar pr

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Jeff Blaine
On 1/28/2011 12:33 PM, Andrew Deason wrote: On Fri, 28 Jan 2011 12:10:38 -0500 Jeff Blaine wrote: The last time we brought our fileservers down (cleanly, according to "shutdown" info via bos status), it struck me as odd that salvages were needed once it came up. I sort of brushed it off. As

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Jeff Blaine
Examples from FileLog.old: Fri Jan 28 10:02:48 2011 VAttachVolume: volume /vicepf/V2023864046.vol needs to be salvaged; not attached. Fri Jan 28 10:02:49 2011 VAttachVolume: volume salvage flag is ON for /vicepa//V2023886583.vol; volume needs salvage Examples from SalvageLog old pretty much

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Jeff Blaine
Do you have the FileLog from that shutdown? No, it was cycled out by me salvaging :| And there isn't anything in play that would cause an old version of the vice partition or something weird like that, is there? (ZFS snapshots, liveupgrade misconfiguration, etc) No. _

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Derrick Brashear
did shutdown perchance take 30min? Derrick On Jan 28, 2011, at 1:50 PM, Jeff Blaine wrote: >> Do you have the FileLog from that shutdown? > > No, it was cycled out by me salvaging :| > >> And there isn't anything in play that would cause an old version of the >> vice partition or something w

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Jeff Blaine
On 1/28/2011 1:52 PM, Derrick Brashear wrote: did shutdown perchance take 30min? Yes. I found this in BosLog.old just now: Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 W

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-28 Thread Derrick Brashear
On Fri, Jan 28, 2011 at 1:58 PM, Jeff Blaine wrote: > On 1/28/2011 1:52 PM, Derrick Brashear wrote: >> >> did shutdown perchance take 30min? > > Yes.  I found this in BosLog.old just now: > > Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 > Wed Jan 26 12:28:13 2011: upclientbin exited o

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Steve Simmons
On Jan 28, 2011, at 1:58 PM, Jeff Blaine wrote: > On 1/28/2011 1:52 PM, Derrick Brashear wrote: >> did shutdown perchance take 30min? > > Yes. I found this in BosLog.old just now: > > Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 > Wed Jan 26 12:28:13 2011: upclientbin exited on si

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Stephen Joyce
On Mon, 31 Jan 2011, Steve Simmons wrote: We have seen similar issues. It occurs when there is a given vice partition where lots of clients have registered callbacks but those clients are no longer accessible. Not all the clients have responded when the 1800 second timer goes off, and the file

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Steve Simmons
On Jan 31, 2011, at 12:17 PM, Stephen Joyce wrote: > On Mon, 31 Jan 2011, Steve Simmons wrote: > >> We have seen similar issues. It occurs when there is a given vice partition >> where lots of clients have registered callbacks but those clients are no >> longer accessible. Not all the clients

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Steve Simmons
On Jan 31, 2011, at 12:36 PM, Andrew Deason wrote: > On Mon, 31 Jan 2011 11:54:24 -0500 > Steve Simmons wrote: > >>> Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 >>> Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 >>> Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 >

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-01-31 Thread Jeffrey Altman
On 1/31/2011 12:17 PM, Stephen Joyce wrote: > On Mon, 31 Jan 2011, Steve Simmons wrote: >> We have about 235,000 volumes spread across 40 vice partitions. Our >> 'fix' is a combination of lengthening that timeout to a 3600 seconds >> and keeping our vice partitions no longer than 2TB. Active partit

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-02-01 Thread Jeff Blaine
Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 Wed Jan 26 12:58:19 2011: bos shutdown: fileserver failed to shutdown within 1800 seconds Wed Jan 26 12:58:37 2011: fs:file exit

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-02-01 Thread Patricia O'Reilly
>From what you have described it sounds to me like you need the patch that >Andrew referenced earlier that allows you to configure an -offline-timeout and >-offline-shutdown-timeout option on your fileservers. We have has similar >problems at our site and will be releasing that patch into produc

Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge

2011-02-07 Thread Steve Simmons
On Feb 1, 2011, at 3:58 PM, Andrew Deason wrote: > On Tue, 01 Feb 2011 12:04:08 -0800 > Patricia O'Reilly wrote: > >> From what you have described it sounds to me like you need the patch >> that Andrew referenced earlier that allows you to configure an >> -offline-timeout and -offline-shutdown-