Re: [OpenAFS] vos convertROtoRW requires salvage ?

2008-04-03 Thread Hartmut Reuter
John Tang Boyland wrote: As people on the list may know, I am in the process of recovering from complete fileserver failure (lesson: don't use inode servers with Solaris 10 x86). In what follows, filip is an inode Solaris 10 x86 fileserver that cannot attach any of its volumes. eastside is a

Re: [OpenAFS] vos convertROtoRW requires salvage ?

2008-04-03 Thread Jason Edgecombe
Hartmut Reuter wrote: No, normally the volume does not go off-line. The only problem I know of is that the new RW-volume being on another server than the old one is not automatically seen by the clients. The reason is that the old dead fileserver one didn't send a callback to the clients (as

Re: Fwd: [OpenAFS] best practice for salvage

2008-04-03 Thread John Hascall
At my last *cough* site, we ran with fast-restart. Because of the cruft that would sometimes get left behind in volumes due to things like crappy fortran compilers, I would run a salvage on each server every 2-3 months. As there were rarely any real errors, it ran pretty quickly and would

Re: Fwd: [OpenAFS] best practice for salvage

2008-04-03 Thread Andrew Bacchi
Thanks, Esther. I can always count on you for good advice. I usually run salvage by hand once or twice a year, but my gut says run it more often. I'll write a script that runs on odd months and call it from either linux-cron or afs-cron. One drawback of afs-cron is it only knows a weekly time

Fwd: [OpenAFS] best practice for salvage

2008-04-03 Thread Esther Filderman
On Wed, Apr 2, 2008 at 1:43 PM, Andrew Bacchi [EMAIL PROTECTED] wrote: I'm considering running a weekly salvage on all file servers from BosConfig. Is this too often? Any reason not to? What are others doing? Thanks. At my last *cough* site, we ran with fast-restart. Because of the cruft

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
] There may be patches in 1.4.7-pre2 that might help. I tried it, but got no better results: @(#) OpenAFS 1.4.7pre2 built 2008-04-03 04/03/2008 08:22:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) 04/03/2008 08:22:21 Starting salvage of file system partition /vicepa 04/03/2008

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman
John Tang Boyland wrote: ] There may be patches in 1.4.7-pre2 that might help. 04/03/2008 08:24:41 SALVAGING VOLUME 536870912. 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted 04/03/2008 08:24:41 totalInodes 165 04/03/2008 08:24:41 Salvage volume group core dumped!

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 11:17 AM, Jeffrey Altman [EMAIL PROTECTED] wrote: John Tang Boyland wrote: ] There may be patches in 1.4.7-pre2 that might help. 04/03/2008 08:24:41 SALVAGING VOLUME 536870912. 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz
Just curious, What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using tools such as fast-restart -- and in the time I was running fast- restart, which included some rather nasty power events which took things

Re: Fwd: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
Esther Filderman wrote: Because of the cruft that would sometimes get left behind in volumes due to things like crappy fortran compilers Anything left behind by an application accessing an AFS volume would be stored in the volume. Salvager would not be involved in cleaning it up. Salvager

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message [EMAIL PROTECTED],Robert Banz write s: What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using running salvage once in a while is a good way to clean up .__afs files.

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Christopher D. Clausen
Chas Williams (CONTRACTOR) [EMAIL PROTECTED] wrote: In message [EMAIL PROTECTED],Robert Banz write s: What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using running salvage once in a while is a good way to clean

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
Chas Williams (CONTRACTOR) wrote: In message [EMAIL PROTECTED],Robert Banz write s: What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using running salvage once in a while is a good way to clean up .__afs

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz
On Apr 3, 2008, at 10:06 AM, Chas Williams (CONTRACTOR) wrote: In message [EMAIL PROTECTED],Robert Banz write s: What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using running salvage once in a while is a

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message [EMAIL PROTECTED],Christopher D. Clausen writes: Would a find command execing rm do the same thing? Or does the salvager actually need to be run for a correct cleanup? you could do it with rm but users tend to change their permissions so the script would need to also change

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Andrew Bacchi
>From what I see of the 1.4.6 SPEC file, fast-restart fileserver is enabled by default. Do I need to start the server with any added options? Is there documentation to read? config_opts="--enable-redhat-buildsys \ %{?_with_bitmap_later:--enable-bitmap-later} \

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Russ Allbery
Jeffrey Altman [EMAIL PROTECTED] writes: What normal successfully completed operation is leaving unreferenced .__afs files behind? Lets fix the bug. Good question. I know we accumulate a ton of them that get cleaned up on each salvage, but I have no idea how to figure out what's

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 1:33 PM, Andrew Bacchi [EMAIL PROTECTED] wrote: From what I see of the 1.4.6 SPEC file, fast-restart fileserver is enabled by default. Do I need to start the server with any added options? Is there documentation to read? config_opts=--enable-redhat-buildsys \

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message [EMAIL PROTECTED],Jeffrey Altman writes: What normal successfully completed operation is leaving unreferenced .__afs files behind? Lets fix the bug. good idea. i dont know how you fix machines not under your control running older (broken) clients. and the primary cause for

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
Russ Allbery wrote: Jeffrey Altman [EMAIL PROTECTED] writes: What normal successfully completed operation is leaving unreferenced .__afs files behind? Lets fix the bug. Good question. I know we accumulate a ton of them that get cleaned up on each salvage, but I have no idea how to

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread John Hascall
[EMAIL PROTECTED] writes: In other words, the .__afs files are unnamed files that as far as the file server is concerned are still in use by some client. The reason the files are left behind is because the AFS cache manager that renamed the file did not delete it before it lost contact

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
Fileserver has no idea which client has it open(*), so... query who? * Not as such. You could guess. There's no mechanism to query though. And what if the client has gone offline now, but will come back shortly? Or is at a new address? On Thu, Apr 3, 2008 at 2:56 PM, John Hascall [EMAIL

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
John Hascall wrote: Since the file server has no way of knowing if the file is still in use it can't delete it. Why not? Is there no way for the file server to query the cache manager and ask? The fact that the file is considered temporary is only known to the client. AFS is not like

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug Reading ld.so.1 Reading

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
John Hascall wrote: Since the file server has no way of knowing if the file is still in use it can't delete it. Why not? Is there no way for the file server to query the cache manager and ask? The fact that the file is considered temporary is only known to the client. And to

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 3:32 PM, John Hascall [EMAIL PROTECTED] wrote: Since the file server has no way of knowing if the file is still in use it can't delete it. Why not? Is there no way for the file server to query the cache manager and ask? The fact that the file is

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman
John Tang Boyland wrote: OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug

[OpenAFS] problems with root volumes

2008-04-03 Thread Steve Gaarder
A couple weeks ago, one of my AFS servers lost its disks. I recovered the root partition from backup. I did not have a backup of the /vicepa partition, no problem, I thought, since it only contained RO replicas. I just generated a new one. Well, it turns out that it also contained the RW

Re: [OpenAFS] kstart for windows ?

2008-04-03 Thread Jeffrey Altman
Hans Melgers wrote: Hello, I was wondering if there are ways to make a windows machine get tokens automatically, similar to Russ's kstart utility for *nix? Or am i missing a cool feature in MIT KfW ? I need it for a win server to sync some files to afs every night. Anybody here who has

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman
Robert Banz wrote: That wouldn't work, because the file could have been open()'d by two different cache managers, unlinked by one, but should still be able to be written to. That doesn't work. Eventually the cache manager on the machine on which the unlink() was executed is going to call

[OpenAFS] kstart for windows ?

2008-04-03 Thread Hans Melgers
Hello, I was wondering if there are ways to make a windows machine get tokens automatically, similar to Russ's kstart utility for *nix? Or am i missing a cool feature in MIT KfW ? I need it for a win server to sync some files to afs every night. Anybody here who has done this before ?

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz
The way I would have implemented this functionality would be for the file to be moved into the local client's cache and removed from the file server since the file has now been unlinked and can therefore not be referenced by other clients. It would then be the client's responsibility to clean

Re: [OpenAFS] kstart for windows ?

2008-04-03 Thread Russ Allbery
Hans Melgers [EMAIL PROTECTED] writes: I was wondering if there are ways to make a windows machine get tokens automatically, similar to Russ's kstart utility for *nix? Or am i missing a cool feature in MIT KfW ? I need it for a win server to sync some files to afs every night. Anybody here

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Hartmut Reuter
Jeffrey Altman wrote: John Tang Boyland wrote: OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your

Re: [OpenAFS] kstart for windows ?

2008-04-03 Thread Christopher D. Clausen
Hans Melgers [EMAIL PROTECTED] wrote: I was wondering if there are ways to make a windows machine get tokens automatically, similar to Russ's kstart utility for *nix? Or am i missing a cool feature in MIT KfW ? I need it for a win server to sync some files to afs every night. Anybody here

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
] (dbx) up ] Current function is DistilVnodeEssence ] 3175 assert(class == vLarge); ] (dbx) list 3170,3180 ] 3170 vep-type = vnode-type; ] 3171 vep-author = vnode-author; ] 3172 vep-owner = vnode-owner; ] 3173

Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz
On Apr 3, 2008, at 1:11 PM, Jeffrey Altman wrote: Robert Banz wrote: That wouldn't work, because the file could have been open()'d by two different cache managers, unlinked by one, but should still be able to be written to. That doesn't work. Eventually the cache manager on the machine

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman
Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c So there might be really some thing wrong with the

Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman
John Tang Boyland wrote: (dbx) print class class = 1 If it's useful, here's some more info: (dbx) print *vnode *vnode = { type = 2U cloned = 1U modeBits = 493U linkCount= 2 length = 8192U uniquifier = 1U