https://bugzilla.wikimedia.org/show_bug.cgi?id=71431
Bryan Davis <bda...@wikimedia.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED Assignee|wikibugs-l@lists.wikimedia. |bda...@wikimedia.org |org | --- Comment #6 from Bryan Davis <bda...@wikimedia.org> --- (In reply to Antoine "hashar" Musso from comment #4) > It seems the root cause of the issue was the LDAP being > upgraded/unreacheable intermittently over the past few days. As a result, > when puppet run it considers that the mwdeploy/l10nupdate (among others) > users do not exist and thus create a local copy of them. Whenever LDAP > comes back, we end up with files having conflicting UID. That most probably > confuse rsync. This was an issue across several hosts in the beta cluster, but it turned out to be unrelated to the disk space issues on rsync01. > Bryan deleted the local users yesterday. He also cleaned up some all > 'common' directories which were left around thus reclaiming a huge amount of > disk space. This was the real problem. When I originally added scap deployment to beta I found that the primary disks for all of the hosts that needed copies of MediaWiki were too small to comfortably contain a full sync. I added secondary LVS mounts to all of these hosts on /srv (or made /srv a symlink to /mnt/srv if LVS was already attached on /mnt). Then I created a symlink from /usr/local/apache/common-local to /srv/common-local where the synced tree from deployment-bastion would be stored. Recently Ori dove into operations/puppet and started working on cleaning up the legacy file paths (/a/common, /usr/local/apache) and replacing them with more modern locations. /usr/local/apache/common and /usr/local/apache/common-local (former was a symlink to the latter) were replaced with /srv/mediawiki. When these changes hit beta, things mostly just worked because puppet and scap worked together to create the right content in the right place. A side effect of this change finally bit us on rsync01. There was no puppet code added to clean up the old /srv/common-local sync target. This left ~3G of files on each scap target host. For the deployment-mediawiki* hosts this was not a big deal. The secondary disk on those hosts is 68G leaving lots of space for the new copy of everything. On deployment-rsync01 however, /srv is an 8.5G partition, so 3G is a significant chunk of the available drive space. I have deleted /src/common-local from all of the hosts in beta. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l