GWicke added a project: Analytics. GWicke added a comment. In https://phabricator.wikimedia.org/T84923#961622, @bd808 wrote:
> > can support large delays (order of days) for individual consumers > > > Do you have a strong use case to support this need? Yes. Hosts can go down for multiple days, and if the even stream is used to do something like reliable purges then it'll be necessary to replay those or throw away the entire cache. There can also be bugs in consumers, which need to be fixed by re-starting the processing from a clean snapshot. From what I hear, https://phabricator.wikimedia.org/tag/analytics/ would love to get even longer event traces. @halfak mentioned a back-of-the-envelope calculation that basically all the primary events he lists on his proposal <https://meta.wikimedia.org/wiki/Research:MediaWiki_events:_a_generalized_public_event_datasource> since the beginning of Wikipedia might fit into 200G. For comparison, I think we currently have several day's worth of buffer for our traffic logs in kafka, which helps to avoid loss if the consumer has issues. That's much higher volume at up to 150k messages/s, while we are looking at low hundreds for edit-related events. TASK DETAIL https://phabricator.wikimedia.org/T84923 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GWicke Cc: GWicke, aaron, JanZerebecki, mobrovac, Halfak, yuvipanda, Hardikj, daniel, Krenair, bd808, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, RobH, aude, Manybubbles, mark, RobLa-WMF, Joe, QChris, chasemp _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs