[Analytics] Re: Mediacounts fields

2022-11-04 Thread Connie Chen
Miriam wrote a query to find images used over N times on a wiki, probably a placeholder or icon. ( https://gitlab.wikimedia.org/repos/structured-data/image-suggestions/-/blob/main/image_suggestions/cassandra.py#L167). And here's the query to calculate this threshold. (

[Analytics] Re: Mediacounts fields

2022-11-04 Thread Neil Shah-Quinn
I believe Connie Chen and Isaac Johnson did some work on distinguishing "real images" from icons as part of the image suggestion analytics (T292316 ). I don't know the details, but perhaps one of them could chime in. - Neil Shah-Quinn senior data

[Analytics] Re: Mediacounts fields

2022-11-04 Thread Dan Andreescu
hm, you know, maybe it's not such a great idea to show all these small files in the mediarequests/top endpoint. I imagine everyone trying to use it would have the same problems you are. Maybe we can brainstorm together on a way to filter out results you might not want. If that top 1000 list

[Analytics] Re: Mediacounts fields

2022-11-04 Thread Michele Mauri via Analytics
Hi! Yes I already tested those two ways. I used the mediarequests api (https://wikimedia.org/api/rest_v1/metrics/mediarequests/top/en.wikipedia.org/image/2022/05/all-days) but since they are just the first 1000 the largest part is composed by icons, buttons ets. While I’d like to focus on the

[Analytics] Re: Mediacounts fields

2022-11-04 Thread Dan Andreescu
I see. In practice, the mediaviewer instrumentation also had some inaccuracies. For example, the code pre-fetched certain images when opening a gallery even if the viewer never ended up looking at them. I think they adjusted the instrumentation to account for that, but I don't remember the