[OpenAFS] Salvager running repeatedly

2010-03-24 Thread Atro Tossavainen
I have a situation on a new OpenAFS 1.4.11 server (sunx86_510; OpenAFS compiled with Sun Studio 12) where the fileserver keeps salvaging the partition repeatedly and does not actually start the fileserver. It is a namei fileserver on a ZFS partition. The salvage logs do not indicate any failures

Re: [OpenAFS] Salvager running repeatedly

2010-03-24 Thread Derrick Brashear
On Wed, Mar 24, 2010 at 4:43 AM, Atro Tossavainen atro.tossavainen+open...@helsinki.fi wrote: I have a situation on a new OpenAFS 1.4.11 server (sunx86_510; OpenAFS compiled with Sun Studio 12) where the fileserver keeps salvaging the partition repeatedly and does not actually start the

Re: [OpenAFS] Salvager running repeatedly

2010-03-24 Thread Atro Tossavainen
So there's no FileLog.old that keeps being refreshed with say a registration failure? Because this sounds like that problem. It was that problem. fileserver coredumped at startup regardless of what it tried to do (even a manual fileserver -help caused a core dump). Upgrading from 1.4.11 to

Re: [OpenAFS] Salvager running repeatedly

2010-03-24 Thread Derrick Brashear
On Wed, Mar 24, 2010 at 9:17 AM, Atro Tossavainen atro.tossavainen+open...@helsinki.fi wrote: So there's no FileLog.old that keeps being refreshed with say a registration failure? Because this sounds like that problem. It was that problem.  fileserver coredumped at startup regardless of what

Re: [OpenAFS] Salvager running repeatedly

2010-03-24 Thread Atro Tossavainen
Good to hear you have it running with 1.4.12. We too run 1.4.11 on Solaris (but on sparc) with namei servers on ZFS, and have not seen this. So I am curious as to what may have happened. I must have messed up somehow by assigning the server a new IP address (in addition to its existing one),

[OpenAFS] Re: Salvager running repeatedly

2010-03-24 Thread Andrew Deason
On Wed, 24 Mar 2010 15:17:41 +0200 (EET) Atro Tossavainen atro.tossavainen+open...@helsinki.fi wrote: So there's no FileLog.old that keeps being refreshed with say a registration failure? Because this sounds like that problem. It was that problem. fileserver coredumped at startup

Re: [OpenAFS] Re: Salvager running repeatedly

2010-03-24 Thread Simon Wilkinson
On 24 Mar 2010, at 15:14, Andrew Deason wrote: That's... interesting. Would you be willing to share a core Absolutely not meant as a personal comment, but it's important to remember fileserver cores may contain your cell-wide key, and you definitely don't want to share them publicly without

Re: [OpenAFS] Re: Salvager running repeatedly

2010-03-24 Thread Dan Hyde
Absolutely not meant as a personal comment, but it's important to remember fileserver cores may contain your cell-wide key, and you definitely don't want to share them publicly without verifying that they don't. And ensure your efforts to remove the KeyFile content do not accidentially

[OpenAFS] Re: Salvager running repeatedly

2010-03-24 Thread Andrew Deason
On Wed, 24 Mar 2010 11:29:48 -0400 Dan Hyde d...@umich.edu wrote: Absolutely not meant as a personal comment, but it's important to remember fileserver cores may contain your cell-wide key, and you definitely don't want to share them publicly without verifying that they don't. Augh,

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Steve Simmons
On Mar 17, 2010, at 5:48 PM, Steven Jenkins wrote: Could you provide filesystem information? (e.g., what filesystem, what parameters given/used by mkfs, etc) That information is often quite significant. So smart of me to drop the note and then leave for vacation. Selected file system values

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Steve Simmons
On Mar 18, 2010, at 2:37 AM, Tom Keiser wrote: On Wed, Mar 17, 2010 at 7:41 PM, Derrick Brashear sha...@gmail.com wrote: On Wed, Mar 17, 2010 at 12:50 PM, Steve Simmons s...@umich.edu wrote: We've been seeing issues for a while that seem to relate to the number of volumes in a single vice

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Steve Simmons
On Mar 18, 2010, at 6:43 AM, Jeffrey Altman wrote: In the 1.4 series, the volume hash table size is just 128 which would produce (assuming even distributions) average hash chains of 160 to 220 volumes per bucket given the number of volumes you describe. This is quite long. In the 1.5

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Russ Allbery
Steve Simmons s...@umich.edu writes: Our estimate too. But before drilling down, it seemed worth checking if anyone else has a similar server - ext3 with 14,000 or more volumes in a single vice partition - and has seen a difference. Note, tho, that it's not #inodes or total disk usage in the

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Steve Simmons
On Mar 24, 2010, at 4:38 PM, Russ Allbery wrote: Steve Simmons s...@umich.edu writes: Our estimate too. But before drilling down, it seemed worth checking if anyone else has a similar server - ext3 with 14,000 or more volumes in a single vice partition - and has seen a difference. Note,

Re: [OpenAFS] Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Tom Keiser
On Wed, Mar 24, 2010 at 4:32 PM, Steve Simmons s...@umich.edu wrote: On Mar 18, 2010, at 2:37 AM, Tom Keiser wrote: On Wed, Mar 17, 2010 at 7:41 PM, Derrick Brashear sha...@gmail.com wrote: On Wed, Mar 17, 2010 at 12:50 PM, Steve Simmons s...@umich.edu wrote: We've been seeing issues for a

[OpenAFS] Re: Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Andrew Deason
On Wed, 24 Mar 2010 23:43:32 -0400 Tom Keiser tkei...@sinenomine.net wrote: What I was trying to say is if the observed performance regression involves either the volserver, or the salvager, then it could involve partition lock contention. However, this will only come into play if you're

Re: [OpenAFS] Re: Performance issue with many volumes in a single /vicep?

2010-03-24 Thread Tom Keiser
On Thu, Mar 25, 2010 at 12:39 AM, Andrew Deason adea...@sinenomine.net wrote: On Wed, 24 Mar 2010 23:43:32 -0400 Tom Keiser tkei...@sinenomine.net wrote: What I was trying to say is if the observed performance regression involves either the volserver, or the salvager, then it could involve