Ottomata added a comment.

So, we need to be really careful here.  This MVP as of yet has zero buy in from 
anyone in ops.  In addition, both @ori and @eevans point out that EventLogging 
already does everything that this MVP encompasses, minus the HTTP service part. 
 Now it is time for me to chime in too, woowee!

> Could you explain how you arrived at the figure of 50k requests per second, 
> which you project for this service?


This is just an arbitrary goal, some number we came up with.  I'd like to be 
able to encourage developers to use EventBus for everything they can think of.

We've scaled EventLogging to about 10k / second by using Kafka, but that is 
only on a single node.  EventLogging is horizontally scalable.  Need more 
throughput?  Add more partitions and processors.

In addition, the EventLogging processors are doing more than just validating 
JSON messages.  They are parsing the JSON data out of encoded query strings via 
regexes, wrapping the incoming event data with generic metadata, anonymizing IP 
addresses using a shared rotating salt key from etcd, sending invalid events 
off to Kafka as EventErrors, etc. etc.

> We also need proper code review and versioning for core schemas, and wikis 
> don't really support code review. We could consider storing pointers to 
> schemas (URLs) instead of the actual schemas in git, but this adds complexity 
> without much apparent benefit:


I think this is true, especially for the 'production' use case of EventBus.  
EventLogging was originally designed for analytics use cases, some of which are 
short lived one-off's (A/B testing, whatever).  Making quick changes via a wiki 
is awesome for this.  Having more control over changes to production schemas 
sounds like a good idea.

> However, if we assume that we need an additional class of in-tree schemas, 
> then the inverse is also true; It would be just as trivial to implement 
> reading from the filesystem.


Agreed, it would trivial to add filesystem based schemas to EventLogging.  In 
fact, this is sort of already done, via the cached schema system.  Schemas 
needed for unit testing are hardcoded into the source and manually inserted 
into the in memory schema cache.  We could do the same thing with a filesystem 
tree of schemas: preload them into the in memory cache.  When asked to validate 
a schema from the filesystem, EventLogging wouldn't even bother trying to reach 
out to meta, since it would already be in the in memory cache.  See: 
https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/schema.py#L64

> > Provides a pluggable, composable, architecture with support for a wide 
> > range of readers/writers

> 

> 

> How would this be an advantage for the EventBus portion? Many third-party 
> users will actually only want a minimal event bus, and EL doesn't seem to 
> help with this from what I have seen.


It does, no? EventLogging is already a useable extension with third party 
Mediawiki installations.  Kafka isn't needed to use EventLogging at all.

> See https://phabricator.wikimedia.org/T88459#1604768. tl;dr: It's not 
> necessarily clear that saving very little code (see above) for EL schema 
> fetching outweights the cost of additional hardware.


As mentioned here <https://phabricator.wikimedia.org/T88459#1597109>, here 
<https://phabricator.wikimedia.org/T88459#1614048>,  here 
<https://phabricator.wikimedia.org/T88459#1617315>,  here 
<https://phabricator.wikimedia.org/T88459#1597109>, and here 
<https://phabricator.wikimedia.org/T114443#1703643>, comparing the performance 
now is interesting, but provides little insight as to how this system will 
perform in the real world with more features.  EventLogging is doing more than 
just accepting a JSON message and validating it against a schema.  In any case, 
this system will need to be horizontally scalable.  As noted, the production 
use case will be much lower volume than the analytics one.  The performance of 
all the solutions we evaluated in https://phabricator.wikimedia.org/T88459 is 
suitable for the production use case, especially since they are all 
horizontally scalable.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata
Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, 
Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, 
Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, 
RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, 
Ltrlg, fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to