https://bugzilla.wikimedia.org/show_bug.cgi?id=71431

Bryan Davis <bda...@wikimedia.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
           Assignee|wikibugs-l@lists.wikimedia. |bda...@wikimedia.org
                   |org                         |

--- Comment #6 from Bryan Davis <bda...@wikimedia.org> ---
(In reply to Antoine "hashar" Musso from comment #4)
> It seems the root cause of the issue was the LDAP being
> upgraded/unreacheable intermittently over the past few days.  As a result,
> when puppet run it considers that the mwdeploy/l10nupdate (among others)
> users do not exist and thus create a local copy of them.  Whenever LDAP
> comes back, we end up with files having conflicting UID.  That most probably
> confuse rsync.

This was an issue across several hosts in the beta cluster, but it turned out
to be unrelated to the disk space issues on rsync01.

> Bryan deleted the local users yesterday.  He also cleaned up some all
> 'common' directories which were left around thus reclaiming a huge amount of
> disk space.

This was the real problem. When I originally added scap deployment to beta I
found that the primary disks for all of the hosts that needed copies of
MediaWiki were too small to comfortably contain a full sync. I added secondary
LVS mounts to all of these hosts on /srv (or made /srv a symlink to /mnt/srv if
LVS was already attached on /mnt). Then I created a symlink from
/usr/local/apache/common-local to /srv/common-local where the synced tree from
deployment-bastion would be stored.

Recently Ori dove into operations/puppet and started working on cleaning up the
legacy file paths (/a/common, /usr/local/apache) and replacing them with more
modern locations. /usr/local/apache/common and /usr/local/apache/common-local
(former was a symlink to the latter) were replaced with /srv/mediawiki. When
these changes hit beta, things mostly just worked because puppet and scap
worked together to create the right content in the right place.

A side effect of this change finally bit us on rsync01. There was no puppet
code added to clean up the old /srv/common-local sync target. This left ~3G of
files on each scap target host. For the deployment-mediawiki* hosts this was
not a big deal. The secondary disk on those hosts is 68G leaving lots of space
for the new copy of everything. On deployment-rsync01 however, /srv is an 8.5G
partition, so 3G is a significant chunk of the available drive space.

I have deleted /src/common-local from all of the hosts in beta.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to