Erez Zadok wrote on 12/5/07 12:59 PM: > Dave, thanks for the report. It seems that somewhere when branch-management > is used, a dput/iput is being missed. > > I'd like to be able to reproduce this on my end if possible. I've got > machines I can dedicate for testing that could take hours/days to manifest. > Do the URLs you provided include all the info I might need to reproduce the > panics on my end?
Sorry for the delay in replying, I've had a busy couple weeks here, and since this wasn't in production yet it took a back seat to a couple server upgrades. Basically, here's how the setup actually is now. I have two servers, dm-stage01 and dm-stage02. dm-stage02 is the one the users actually get access to. It only has read access to the NetApp shelf hosting the back-end filesystem. dm-stage01 has write access to the NetApp, and [EMAIL PROTECTED] has its passwordless ssh key in the authorized_keys file on dm-stage02. So dm-stage01 logs into dm-stage02 and pulls over the files that have been uploaded since the last pass. Virus scanning is then done on dm-stage01, which then propagates the changes to the NetApp after the virus scanning completes. dm-stage02 is set up with a chroot jail containing only the unionfs mount point, and the users are jailed there at login by the scponly shell. I've got a number of helper scripts written up to make this process a bit easier. A few of these might duplicate stuff in the unionfs tools, but I never managed to get those to compile, and it was all stuff that was easily scriptable anyway. My scripts can be found at http://people.mozilla.com/~justdave/stage-sbin-20071214.tar.gz The /etc/sysconfig/ftpstage-config that's sourced from the top of the scripts contains this: STAGE_BASE=/mnt/netapp/stage/archive.mozilla.org STAGE_WORKSPACE=/mnt/unionfs/stage STAGE_MOUNTPOINT=/data/ftp dm-stage01 runs stage-sync as a cron job every 5 minutes. Most of the unionfs-mangling is done on dm-stage02 via the other scripts invoked via ssh from dm-stage01. I can usually trigger the crash with some regularity by taking a directory of 792MB / 124 files from the current production staging box (which lets people touch the NetApp directly -- what we're trying to get away from) and repeatedly rsyncing it into dm-stage02 (a.k.a stage-new) and then rsyncing an empty directory overtop of it with the --delete option. Basically, I did this : while [ 1 ]; do rsync -av -e ssh /pub/mozilla.org/firefox/releases/granparadiso/ [EMAIL PROTECTED]:/pub/mozilla.org/firefox/nightly/experimental/stage-migration-loadtest; sleep 600; rsync -av -e ssh --delete /root/empty/ [EMAIL PROTECTED]:/pub/mozilla.org/firefox/nightly/experimental/stage-migration-loadtest; sleep 600; done And let it run until it crashed. If you need any more info to help reproduce it, let me know... hoping to have this live by the end of the year, and with the other server upgrades out of the way this now has my full attention until I get it resolved. :) Thanks again for the help! -- Dave Miller http://www.justdave.net/ System Administrator, Mozilla Corporation http://www.mozilla.com/ Project Leader, Bugzilla Bug Tracking System http://www.bugzilla.org/ _______________________________________________ unionfs mailing list: http://unionfs.filesystems.org/ unionfs@mail.fsl.cs.sunysb.edu http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs