https://bugzilla.wikimedia.org/show_bug.cgi?id=49757
--- Comment #5 from Ori Livneh <o...@wikimedia.org> --- Some notes about how things are currently configured: MediaWiki can report errors to a remote host via UDP. The MediaWiki instances on the production cluster are configured to log to a host named 'fluorine'. This is done by specifying its address as the value of $wmfUdp2logDest in CommonSettings.php (in operations/mediawiki-config.git). The MediaWiki instances that power the beta cluster set $wmfUdp2logDest to 'deployment-bastion' (<https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000390>), a Labs instance which plays the role of fluorine. It writes log data to files in /home/wikipedia/logs. Exceptions and fatals are respectively logged to exception.log and fatal.log in that directory. When I first started looking at these logs, I didn't want to mess with the file-based logging, since it's an important service that developers rely on. So I submitted a patch to have fluorine stream the log data as it receives it to an another host (vanadium), in addition to writing it to disk. On vanadium I have a script that is generating the Ganglia graphs at <http://ur1.ca/edq1f>. Yesterday I submitted change Ia0cc8de43 and Ryan merged it. That change reproduces the state of affairs described above (i.e. the duplication of the log stream to two destinations, fluorine and vanadium) on the beta cluster. It does so by having deployment-bastion forward a copy of the log data to a new instance, deployment-fluoride (<https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000084c>). So the TL;DR is that there is an instance on the beta cluster (deployment-fluoride) that receives a live stream of errors and fatals being generated on the beta cluster MediaWikis, and we're free to use it as a sandbox for trying out different ways of capturing and representing this data. I've only taken some initial steps, which is to take the stream of exceptions and fatals (which follow an idiosyncratic format that is not easy to analyze) and transform each error report into a JSON document. This is the work done in Ia0cc8de43 (<https://gerrit.wikimedia.org/r/#/c/75560/>). Or "half-done", as I should say, since I've discovered a couple of bugs that I haven't yet had a chance to fix. The nice thing about JSON is that most modern languages have built-in modules in their standard library for handling it. So the status quo is that pending a couple of bugfixes there will shortly be streaming JSON service on deployment-fluoride that publishes MediaWiki error and exception reports as machine-readable objects. In this state, the logs are quite easy to pipe into a data store or a visualization framework. We have to figure out what exactly we want to do, though, and then spec out some solution, ideally using solid off-the-shelf solutions where such solutions exist. Some ideas to get the ball rolling: https://getsentry.com/welcome/ (packages itself as a paid service, but the software is open-source). http://logstash.net/ We could also build our own custom UI for spelunking the data. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l