[libreoffice-website] Minutes from the Tue Jan 16 infra call

2018-01-16 Thread Guilhem Moulin
Participants
 1. davido
 2. guilhem
 3. Brett
 4. Christian

Agenda
 * Upgrade ancient Gerrit version 2.11.8 to 2.13.9 (used by OpenStack
   and Wikimedia for years now, without any issue)
   - Q: according to my notes 2.11.8 was released on 2016-03-09 and
 2.13.9 on 2017-07-03?  Are there known vuln in 2.11.x?  Is it about
 getting feature fixes and new shiniest software?
 . No known vulnerability, but there are a bunch of new features,
   especially inline edit feature
   - David: dedup scripts should keep working with 2.13.x
   - David: see old redmine ticket Norbert filed about migration
 . do you mean my comments in 
https://redmine.documentfoundation.org/issues/1587#note-4 ?
 . I meant this Norbert's comment: 
https://redmine.documentfoundation.org/issues/1587#note-8
   - Cloph: difficult to test everything as OAuth needs proper DNS setup
   - Cloph: can't copy the database to a test VM and grant access to
 everyone as we have private repos
   - Cloph: release-wise, it would be ideal to do that (switching the
 live instance) in March or so (after 6.0.1)
   - Q: Is Norbert coming to FOSDEM? Would be ideal time to brainstorm
 there
   - Roadmap:
 - Set up staging gerrit instance:
 - Synchronize production gerrit content to gerrit-test:
 - Simulate upgrade process:
   . Stop gerrit
   . Perform database and git repository backup
   . Update gerrit version
   . Update all external plugins (gerrit-oauth-provider)
   . Run init command in batch mode, all used internal plugins
 should be updated (double-check)
   . Run reindex command
   . Start gerrit
   . Verify, that gerrit still works as expected
 → this is the (very) hard part, as test-instance cannot have
   all features enabled, and of course you don't think of any
   possible user-mistakes that had to be dealt with.
 - Schedule gerrit upgrade in production
   - AI guilhem/cloph: create redmine ticket to follow progress
 * Gerrit: added `git gc` to a weekly cronjob so crap doesn't accumulate
   and slow down the repo
   - Q: is the frequency suitable?  Also pass --aggressive (cloph: no)
 and/or --prune=?
   - Cloph: slowness might be caused by gerrit keeping FDs open
 * Network issues (hypers, gustl) seem fixed since manitu plugged to a
   new switch last week (Wed Jan 10)
   - Need to keep an eye on the RX FCS counter (gustl) and the link
 speed (hypers)
 * Saltstack:
   - mail.postfix state is now ready, since mail/README for the config
 options (and on pillar for usage examples: antares, vm150, vm194,
 vm202, etc.)
   - Proposal: more aggressive control for SSH and sudo access:
 . ACL for SSH access already in place (user must be in ssh-login
   Unix group, which is assigned — and possibly trimmed — with the
   ssh salt state)
 . Also limit the authorized_key(5) list to the keys that are found
   in pillar?  would avoid eg, leaving your key in
   ~root/.ssh/authorized_keys during a migration and forgetting
   about it afterwards → OK
 . Also assign — and possibly trim — the list of sudo group members
   in salt? → OK
 group_map:
   sudo: [ superdev1, superdev2 ]
   adm: other-username
 . Cloph: beware of shared users (eg, tdf@ci); yaml-foo to share ssh
   keys
 . These would provide a clear overview (in pillar) of who has
   access to what; the same could be done using NSS and pam_ldap(5).
 * Backup
   - right now rsnapshot-based (using `ssh -lroot` from berta as rsync's
 remote shell)
 . do we really want to open a root shell to each host from berta?
   → Nope :-)
 . for rsync we could at least add restriction on the ssh-key
   (remount fs read-only, and use `rsync --server --sender …` as
   forcecommand)
   - databases are downloaded in full each time, using pg_dumpall(1) or
 mysqldump(1) and compressed locally
 . large database clutter disk IO and network bandwith (even though
   we're far from saturating the link since the upgrade to the new
   switch, that's wasteful), for instance the bugzilla PostgreSQL
   database is currently 44.9GiB (20.3GiB after gzip compression),
   and takes around 95min to transfer at sustained 5MiB/s transfer
   rate *on the public interface*
   → AI guilhem: add a private interface on br1 to all VMs (brought
 that before, didn't do it yet)
 . Q: do we know what is the bottleneck? local disk IO? local
   compression (zstd to the rescue)? network (probably not)? berta's
   disk IO (probably not)?
   → cloph: it's single-threaded compression maxing out CPU thread
 . full backup is wasteful, especially with large databases.  Does
   anyone have experience with with PostgreSQL continuous archiving
   (PITR)?
   https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
   → AI guilhem: deploy that on some 

[libreoffice-website] Re: Infra call on Tue, Jan 16 at 17:30 UTC

2018-01-16 Thread Christian Lohmaier
HI *.

On Mon, Jan 15, 2018 at 11:58 PM, Guilhem Moulin
 wrote:
> On Tue, 09 Jan 2018 at 18:47:41 +0100, Guilhem Moulin wrote:
>> The next infra call will take place at `date -d 'Tue Jan 16 17:30:00 UTC 
>> 2018'`
>> (18:30:00 Berlin time).
>>
>> See https://pad.documentfoundation.org/p/infra for details; agenda TBA.
>
> Reminder: that's tomorrow!

Sending such a statement just before midnight is kinda misleading :-)

Tuesday is *today* :-)

ciao
Christian

-- 
To unsubscribe e-mail to: website+unsubscr...@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/website/
All messages sent to this list will be publicly archived and cannot be deleted