GWicke added a project: Analytics.
GWicke added a comment.

In https://phabricator.wikimedia.org/T84923#961622, @bd808 wrote:

> > can support large delays (order of days) for individual consumers
>
>
> Do you have a strong use case to support this need?


Yes. Hosts can go down for multiple days, and if the even stream is used to do 
something like reliable purges then it'll be necessary to replay those or throw 
away the entire cache.

There can also be bugs in consumers, which need to be fixed by re-starting the 
processing from a clean snapshot.

From what I hear, https://phabricator.wikimedia.org/tag/analytics/ would love 
to get even longer event traces. @halfak mentioned a back-of-the-envelope 
calculation that basically all the primary events he lists on his proposal 
<https://meta.wikimedia.org/wiki/Research:MediaWiki_events:_a_generalized_public_event_datasource>
 since the beginning of Wikipedia might fit into 200G.

For comparison, I think we currently have several day's worth of buffer for our 
traffic logs in kafka, which helps to avoid loss if the consumer has issues. 
That's much higher volume at up to 150k messages/s, while we are looking at low 
hundreds for edit-related events.


TASK DETAIL
  https://phabricator.wikimedia.org/T84923

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GWicke
Cc: GWicke, aaron, JanZerebecki, mobrovac, Halfak, yuvipanda, Hardikj, daniel, 
Krenair, bd808, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, RobH, aude, 
Manybubbles, mark, RobLa-WMF, Joe, QChris, chasemp



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to