Erez Zadok wrote on 12/5/07 12:59 PM:
> Dave, thanks for the report.  It seems that somewhere when branch-management
> is used, a dput/iput is being missed.
> 
> I'd like to be able to reproduce this on my end if possible.  I've got
> machines I can dedicate for testing that could take hours/days to manifest.
> Do the URLs you provided include all the info I might need to reproduce the
> panics on my end?

Sorry for the delay in replying, I've had a busy couple weeks here, and
since this wasn't in production yet it took a back seat to a couple
server upgrades.

Basically, here's how the setup actually is now.  I have two servers,
dm-stage01 and dm-stage02.  dm-stage02 is the one the users actually get
access to.  It only has read access to the NetApp shelf hosting the
back-end filesystem.  dm-stage01 has write access to the NetApp, and
[EMAIL PROTECTED] has its passwordless ssh key in the authorized_keys file
on dm-stage02.  So dm-stage01 logs into dm-stage02 and pulls over the
files that have been uploaded since the last pass.  Virus scanning is
then done on dm-stage01, which then propagates the changes to the NetApp
after the virus scanning completes.

dm-stage02 is set up with a chroot jail containing only the unionfs
mount point, and the users are jailed there at login by the scponly shell.

I've got a number of helper scripts written up to make this process a
bit easier.  A few of these might duplicate stuff in the unionfs tools,
but I never managed to get those to compile, and it was all stuff that
was easily scriptable anyway.  My scripts can be found at
http://people.mozilla.com/~justdave/stage-sbin-20071214.tar.gz

The /etc/sysconfig/ftpstage-config that's sourced from the top of the
scripts contains this:

STAGE_BASE=/mnt/netapp/stage/archive.mozilla.org
STAGE_WORKSPACE=/mnt/unionfs/stage
STAGE_MOUNTPOINT=/data/ftp

dm-stage01 runs stage-sync as a cron job every 5 minutes.  Most of the
unionfs-mangling is done on dm-stage02 via the other scripts invoked via
ssh from dm-stage01.

I can usually trigger the crash with some regularity by taking a
directory of 792MB / 124 files from the current production staging box
(which lets people touch the NetApp directly -- what we're trying to get
away from) and repeatedly rsyncing it into dm-stage02 (a.k.a stage-new)
and then rsyncing an empty directory overtop of it with the --delete option.

Basically, I did this :

while [ 1 ]; do rsync -av -e ssh
/pub/mozilla.org/firefox/releases/granparadiso/
[EMAIL 
PROTECTED]:/pub/mozilla.org/firefox/nightly/experimental/stage-migration-loadtest;
sleep 600; rsync -av -e ssh --delete /root/empty/
[EMAIL 
PROTECTED]:/pub/mozilla.org/firefox/nightly/experimental/stage-migration-loadtest;
sleep 600; done

And let it run until it crashed.

If you need any more info to help reproduce it, let me know...  hoping
to have this live by the end of the year, and with the other server
upgrades out of the way this now has my full attention until I get it
resolved. :)

Thanks again for the help!

-- 
Dave Miller                                   http://www.justdave.net/
System Administrator, Mozilla Corporation      http://www.mozilla.com/
Project Leader, Bugzilla Bug Tracking System  http://www.bugzilla.org/
_______________________________________________
unionfs mailing list: http://unionfs.filesystems.org/
unionfs@mail.fsl.cs.sunysb.edu
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

Reply via email to