Ottomata added a comment. So, we need to be really careful here. This MVP as of yet has zero buy in from anyone in ops. In addition, both @ori and @eevans point out that EventLogging already does everything that this MVP encompasses, minus the HTTP service part. Now it is time for me to chime in too, woowee!
> Could you explain how you arrived at the figure of 50k requests per second, > which you project for this service? This is just an arbitrary goal, some number we came up with. I'd like to be able to encourage developers to use EventBus for everything they can think of. We've scaled EventLogging to about 10k / second by using Kafka, but that is only on a single node. EventLogging is horizontally scalable. Need more throughput? Add more partitions and processors. In addition, the EventLogging processors are doing more than just validating JSON messages. They are parsing the JSON data out of encoded query strings via regexes, wrapping the incoming event data with generic metadata, anonymizing IP addresses using a shared rotating salt key from etcd, sending invalid events off to Kafka as EventErrors, etc. etc. > We also need proper code review and versioning for core schemas, and wikis > don't really support code review. We could consider storing pointers to > schemas (URLs) instead of the actual schemas in git, but this adds complexity > without much apparent benefit: I think this is true, especially for the 'production' use case of EventBus. EventLogging was originally designed for analytics use cases, some of which are short lived one-off's (A/B testing, whatever). Making quick changes via a wiki is awesome for this. Having more control over changes to production schemas sounds like a good idea. > However, if we assume that we need an additional class of in-tree schemas, > then the inverse is also true; It would be just as trivial to implement > reading from the filesystem. Agreed, it would trivial to add filesystem based schemas to EventLogging. In fact, this is sort of already done, via the cached schema system. Schemas needed for unit testing are hardcoded into the source and manually inserted into the in memory schema cache. We could do the same thing with a filesystem tree of schemas: preload them into the in memory cache. When asked to validate a schema from the filesystem, EventLogging wouldn't even bother trying to reach out to meta, since it would already be in the in memory cache. See: https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/schema.py#L64 > > Provides a pluggable, composable, architecture with support for a wide > > range of readers/writers > > > How would this be an advantage for the EventBus portion? Many third-party > users will actually only want a minimal event bus, and EL doesn't seem to > help with this from what I have seen. It does, no? EventLogging is already a useable extension with third party Mediawiki installations. Kafka isn't needed to use EventLogging at all. > See https://phabricator.wikimedia.org/T88459#1604768. tl;dr: It's not > necessarily clear that saving very little code (see above) for EL schema > fetching outweights the cost of additional hardware. As mentioned here <https://phabricator.wikimedia.org/T88459#1597109>, here <https://phabricator.wikimedia.org/T88459#1614048>, here <https://phabricator.wikimedia.org/T88459#1617315>, here <https://phabricator.wikimedia.org/T88459#1597109>, and here <https://phabricator.wikimedia.org/T114443#1703643>, comparing the performance now is interesting, but provides little insight as to how this system will perform in the real world with more features. EventLogging is doing more than just accepting a JSON message and validating it against a schema. In any case, this system will need to be horizontally scalable. As noted, the production use case will be much lower volume than the analytics one. The performance of all the solutions we evaluated in https://phabricator.wikimedia.org/T88459 is suitable for the production use case, especially since they are all horizontally scalable. TASK DETAIL https://phabricator.wikimedia.org/T114443 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ottomata Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, Ltrlg, fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs