Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-26 Thread Dan Andreescu
Also, I followed up and added the the FAQ: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats/Metrics/FAQ#Why_do_pageviews_API_endpoints_serve_fresh_data_but_edit_API_endpoints_serve_monthly_data On Mon, Mar 26, 2018 at 10:46 AM, Dan Andreescu wrote: > (I ask this >>> because today

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-26 Thread Dan Andreescu
> > (I ask this >> because today we have a lot of interest in append-only logs, like in >> Dat, Secure Scuttlebutt, and of course blockchains—systems where >> information cannot be repudiated after it's published. If Wikipedia >> rejects append-only logs and allows official history to be changed, >

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-26 Thread Joseph Allemandou
Hi Ahmed, In my opinion the 126 discrepancy is due to deletes/restores complex patterns. The notion of 'fixed' is not super clear to me here :) About the data being updated monthly because of a full history scan, you're mostly right. Here is a summary of my view on it: - user and page tables mai

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-23 Thread Ahmed Fasih
Thank you Joseph and Neil, that is so helpful! Is it possible the 126 edits discrepancy for February 28th will be corrected the next time the data regeneration/denormalization is run, at the end of the month, to generate the daily data for the REST API? I ask not because 126 edits (<0.01% error!)

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-23 Thread Joseph Allemandou
Hi Ahmed and Neil, Super interesting project you have Ahmed :) Thanks Neil for the very precise you had to Ahmed's question ! Some comments about number disparity below: > >> https://quarry.wmflabs.org/query/25783 > > >> >> and I see that Quarry reports 168668 while the REST API reports 169754 >>

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-23 Thread Neil Patel Quinn
On 23 March 2018 at 07:02, Ahmed Fasih wrote: > Neil, thank you so much for your insightful comments! > No problem. It's always a good feeling when you know the answer to someone else's question :) > I was able to use Quarry to get the number of edits on English > Wikipedia yesterday, so I can

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-22 Thread Ahmed Fasih
Neil, thank you so much for your insightful comments! I was able to use Quarry to get the number of edits on English Wikipedia yesterday, so I can indeed get recent data from it—hooray!!! I also used it to cross-check against the REST API for February 28th: https://quarry.wmflabs.org/query/25783

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-22 Thread Dan Andreescu
really good summary of the situation, Neil, I'm bookmarking this and will re-use it when people ask :) On Thu, Mar 22, 2018 at 7:07 AM, Neil Patel Quinn wrote: > On 22 March 2018 at 13:41, Neil Patel Quinn wrote: > >> >> Both the edit data and pageview data that you're talking about come from >

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-22 Thread Neil Patel Quinn
On 22 March 2018 at 13:41, Neil Patel Quinn wrote: > > Both the edit data and pageview data that you're talking about come from > the Hadoop-based Analytics Data Lake > . However, > because of limitations in the underlying MediaWiki applica

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-22 Thread Neil Patel Quinn
Hello Ahmed, nice to meet you! As a data analyst who constantly works with the edit data, I would love to have it updated daily too. But there are serious infrastructural limitations that make that very difficult. Both the edit data and pageview data that you're talking about come from the Hadoop

[Analytics] Latency of hourly vs daily endpoints?

2018-03-21 Thread Ahmed Fasih
Hello! I have some questions about the latency of some Wikipedia REST endpoints from https://wikimedia.org/api/rest_v1 I see that I can get very recent pageviews data, e.g. https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/en.wikipedia/all-access/all-agents/hourly/2018032100/20180323