<optimism>What's the worst that could happen?</optimism> The big news this week is the replacement of Gluster as the provider of shared storage by NFS for the tools project.
In doing so, we're going to gain a great deal of extra stability, and quite a bit of performance. In addition, there is now an automatic timetravel snapshot feature allowing users to look at the filesystem as it was during snapshots spanning: once an hour for the past three hours, once a day for the past three days, and two weekly snapshots (on Sundays). That said, NFS is robust but does not scale as much as we would like in the longer term; we will keep investigating clustered storage solutions for the future, with an eye to returning to it once we find a solution that is (a) no less robust than NFS, (b) no less reliable than the current storage and (c) at least as good from a performance standpoint. Technically, this storage is already available to all projects but the configuration necessary to /replace/ gluster with it would generally require a per-project outage. (Involving copying the contents of the previous store to the new one, and substituting one mount point for another -- a process which running processes can generally not cope with). In practice, the tools project will be the first transition "victim", with an outage tomorrow to make the switchover. Sadly, the process is necessarily disruptive and currently running processes will be affected (see below for details). As secondary news, thanks to the unwavering efforts of Asher, the database replication is now at a point where we are actually selecting what, exactly, is going to be replicated. Things are progressing there at a fair clip and we're still sticking pretty close to schedule. More news on that topic next week. === Planned outage === When: Tuesday April 23 at 18:00 UTC Duration: 1 hour Impact: * Jobs running on the grid engine will be stopped, and execution nodes will be temporarily disabled; * The login server will be restarted during the window, ending active sessions; * The web service will be unavailable during the maintenance window; and * Running processes not scheduled through the grid engine will be killed. Recovery plan: In case of unplanned failure during the maintenance window, configuration will be rolled back to the current version (that is, the gluster-based project storage will remain in place) and a new window will be planned after postmortem. -- Marc _______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette