>
> Is there any *public* list of which exceptions/errors they are.

Well, the Ganglia graphs distinguish different types of errors
(out-of-memory fatals, time limit fatals, miscellaneous fatals, exceptions,
catchable fatals, and query errors). At present there is nothing that is
more granular than that, private or public. The error log we consult is an
undifferentiated stream of text.

However, it is an area of our code that could easily welcome contributions
from the community. Hashar enabled error logging for the beta cluster, so
labs is now a viable development environment for a generic error-processing
solution.

Relevant code exists in two locations:

https://git.wikimedia.org/blob/operations%2Fpuppet.git/9792c164d10f9f9f209227ec52a9cd9dce2cbf79/modules%2Feventlogging%2Ffiles%2Fmwerrors.py
(this
is the script that is emitting stats to Ganglia)

and

https://git.wikimedia.org/tree/mediawiki%2Ftools%2Ffluoride.git (set of
regexps to parse the data even further; not currently used anywhere.)

I've been working on this in my spare time, but I'd be happy to provide
mentorship, code review & deployment from interested contributors. If
someone competent (a category which explicitly includes you, Brian!) wants
to take over and "own" this problem, that's cool with me too.

There's a lot we could do in this area. It should be possible to
probabilistically trace an error to the commit(s) that introduced it.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to