Nuria added a comment.
And, forgot to say, THIS IS SUPER USEFUL, thanks!
TASK DETAIL
https://phabricator.wikimedia.org/T249654
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac, Nuria
Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria
Nuria added a comment.
> No transclusion: in line with current estimates, 41% of articles had no
templates that support transclusion whatsoever
This is "overall articles for all projects", correct?
> Overlapping with the other categories, 54% of articles additional
Nuria added a subscriber: calbon.
Nuria added a comment.
cc @calbon so he is aware of this reserach
TASK DETAIL
https://phabricator.wikimedia.org/T249654
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac, Nuria
Cc: calbon, Addshore
Nuria added a comment.
Then I suggest something like: https://wmcs-edits.wmflabs.org/#wmcs-edits
this dashboard is public and powered by data that is extracted from hive.
TASK DETAIL
https://phabricator.wikimedia.org/T154601
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
Nuria added a comment.
@Addshore you can query wikidata_entity from the SQL lab tb in superset.
Superset is not public however so to access you need public authentication
https://superset.wikimedia.org
TASK DETAIL
https://phabricator.wikimedia.org/T154601
EMAIL PREFERENCES
https
Nuria added a comment.
Given that retention is not on puppet is this a setting that is communicated
to a new node when it joins the cluster by the leader of the partition or
similar?
TASK DETAIL
https://phabricator.wikimedia.org/T253753
EMAIL PREFERENCES
https
Nuria closed subtask T253753: Increase retention for mediawiki.revision-create
on the kafka jumbo cluster as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T244590
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: revi, Mholloway
Nuria closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T253753
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ottomata, Nuria
Cc: Nuria, JAllemandou, Ottomata, dcausse, Aklapper, CBogen, 4748kitoko,
dar
Nuria closed this task as "Declined".
Restricted Application removed a subscriber: Liuxinyu970226.
TASK DETAIL
https://phabricator.wikimedia.org/T59379
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Addshore, Aklapper, QChri
Nuria closed subtask T164020: Use spark to split webrequest on tags as
Declined.
TASK DETAIL
https://phabricator.wikimedia.org/T169798
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: PokestarFan, gerritbot, AndrewSu, Nuria, Aklapper
Nuria closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T236895
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ladsgroup, Nuria
Cc: mforns, Milimetric, Ladsgroup, Nuria, JAllemandou, elukey, Addshore,
Nuria added a comment.
I think it will be very helpful to have a design document for this service so
we are all in the same page of what the flink install would do (as there are
other projects currently evaluating flink as well). Can we get a google doc
that goes over the design proposed
Nuria closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAllemandou, Nuria
Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottom
Nuria closed subtask T243832: Fix hdfs-rsync`prune-empty-dirs` feature as
Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAllemandou, Nuria
Cc: Isaac, Groceryheist, MGerlach, WMDE
Nuria added a comment.
I think before talking about bytes you need a use case, what is the use case
here? As we mentioned earlier the GLAM folks care about human pageviews (real
eye balls) on media files and pages, both cases are (and will be better)
satisfied by existing analytics APIs
Nuria closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Milimetric, Nuria
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers,
M
Nuria closed subtask T239565: Create reportupdater reports that execute SDC
requests as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Milimetric, Cparle, nettrom_WMF
Nuria added a comment.
Please see:
https://stats.wikimedia.org/v2/#/wikidata.org/reading/total-page-views/normal|bar|2-year|agent~user*spider|monthly
TASK DETAIL
https://phabricator.wikimedia.org/T174981
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
Nuria added a comment.
@christophbraun I think it would help to start a ticket describing your use
case in detail. Have in mind that pageviews (defined as content consumed by
humans) do not really "apply" to wikidata items. The bulk of the activity on
the site around http requests
Nuria added a comment.
Updating WDQS (a relational query engine) with metadata about pageviews (per
definition a timeseries) seems not the best idea from a data modeling
standpoint. The GLAM use case is much better served by an API that returns
pageviews across time, I would put
Nuria added a comment.
> I'd like a clear explanation as the multiple threads have become
complicated to resolve objectively.
Indeed.
@Milimetric we just want to have data for commons, for wikidata there are
plenty metrics available that also measure usage from this table, see:
ht
Nuria added a comment.
@Addshore @matthiasmullie Can we document on
https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage what the "M" in
wbc_entity_usage stands for?
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wik
Nuria added a comment.
> We would still like productionized reports for (3). If that is still
possible, I would love to discuss it more :)
Please coordinate with #product-analytics
<https://phabricator.wikimedia.org/tag/product-analytics/> on those.
I found yet anoth
Nuria added a comment.
@Abit: Sorry it was not clear. This below is the request you send to
analytics couple weeks ago via e-mail, as I mentioned then we rather work on
requests via phab tickets that via e-mail.
"
From: Amanda Bittaker
Date: Tue, Nov 19, 2019 at 1:30 AM
Su
Nuria added a comment.
> So, it seems we have 2 completely separate definitions of "structured data":
Well, the intent of these numbers is not to measure "structured data usage"
nor to define that concept. The intent is to measure the impact of the
structure data
Nuria added a comment.
I see 7.9 million wikidata items on that table and 1936 mediawinfo items.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Milimetric, Cparle, nettrom_WMF
Nuria added a comment.
> but by and large, it's currently wikidata entities being pulled in via Lua
It is exclusively wikidata items at this time correct?
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/pa
Nuria added a comment.
Select for @mpopov to look at
select count(distinct eu_page_id) from mediawiki_page as P JOIN
mediawiki_wbc_entity_usage as W ON (W.eu_page_id = P.page_id and
W.wiki_db=P.wiki_db and W.snapshot=P.snapshot) where W.wiki_db="commonswiki"
and W.snapsho
Nuria added a comment.
Please see my comment on https://phabricator.wikimedia.org/T238878#5726624
Seems like the 7.9 million items are from contributions of wikidata alone.
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
Nuria added a comment.
After looking at this for a bit with @Ladsgroup and @mpopov (cc @Abit )
A commons page can have data from wikidata and the wikibase instance on
commons (called WikibaseMediaInfo, MediaInfor for short).
wbc_entity_usage will not have any data from structured
Nuria added a comment.
Adding here public pdf with structure data on commons grant proposal:
https://upload.wikimedia.org/wikipedia/foundation/f/f0/Public_Copy_-_Structured_Data_on_Commons_Proposal.pdf
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https
Nuria added a comment.
@Abit: The queries that report the 7.8 million include , per @matthiasmullie
comment both Wikidata Items and Mediainfo items. We can help calculate the
percentage of each but from numbers thus far it seems that of those 7.8M items
more than half are Wikidata items
Nuria added a comment.
@abit: Numbers about SDC will be reported in the platform evolution slides.
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Milimetric, Nuria
Cc: Abit, Ramsey-WMF
Nuria added a comment.
It looks like we are going to have to report this number on the tunning
session so taking back my comment above, let's proceed.
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
Nuria added a comment.
Let's pause this work as it turns out as there is a parallel effort happening
, @Abit to create a ticket for ongoing work
TASK DETAIL
https://phabricator.wikimedia.org/T239565
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
Nuria added a comment.
Some alternatives: superset can source data from other places than druid and
we have couple dashboards on top of some tables in staging. This might not be
the best option as reportupdater produces tsvs rather than inserting data into
staging again. Druid is good
Nuria created this task.
Nuria added projects: Analytics, Wikidata, SDC General, Product-Analytics,
Analytics-Kanban.
TASK DESCRIPTION
These queries: https://phabricator.wikimedia.org/T238878#5692516 should be
productionized as reportupdater reports either from data from the replicas or
from
Nuria added a comment.
Can someone explain why the content table is as large as the revision table?
I though these tables were not used much other than commons but content table
is big in dewiki for example
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https
Nuria added a subtask: T239127: Import slots/slots_roles and
wikibase.wbc_entity_usage through scoop .
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Cparle, nettrom_WMF, Ladsgroup
Nuria added a comment.
We will be scooping the slot tables to the cluster so these queries can run
in parallel.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Cparle, nettrom_WMF
Nuria added a comment.
So, per my comment above, I think the number of items is actually smaller
than the one @Addshore has computed but more wise folks can correct me if I am
wrong.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https
Nuria added a comment.
@Addshore : disclaimer: I know next to nothing about this but how are you
taking into account that the revision is the last one for the page? That is, a
page might have had a structured data item in a prior revision and from its
most current revision that structured
Nuria updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko,
darthmon_wmde, DannyS712, Nandana
Nuria added a subscriber: Addshore.
Nuria added a comment.
I think @Addshore had some information to add here.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Addshore, kzimmerman
Nuria added a comment.
Per T220525 <https://phabricator.wikimedia.org/T220525> it looks like none of
the xml dump files that provide content for analytics contain any data about
structure data in commons files.
Proposed changes (that I am not sure got implemented) to change the
Nuria updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko, darthmon_wmde,
DannyS712, Nandana, JKSTNK
Nuria created this task.
Nuria added projects: Analytics, SDC General, Product-Analytics.
Restricted Application added a project: Wikidata.
TASK DESCRIPTION
TASK DETAIL
https://phabricator.wikimedia.org/T238878
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
Nuria added a comment.
The work done by @mpopov
The wbc_entity_usage table is supposed to hold info on Wikidata usage for the
pages For example, here's a random file I added some structured data to a few
days ago:
https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i
Nuria added a comment.
Restricted Application added a project: Structured-Data-Backlog.
I see this ticket is resolved but the dumps on commons have version
version="0.10" since from this ticket i gather that the dumps that contain
those slots are version=11 , are those being produ
Nuria added a comment.
yes, you can use
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/GetHostPropertiesUDF.java
to get the "project/family"
TASK DETAIL
https://phabricator.wikimedia.org/T236
Nuria added a comment.
So this query needs to remove the is_pageview=true line:
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/WikidataArticlePlaceholderMetrics.scala#L90
TASK DETAIL
https
Nuria added a comment.
Ya, =1 to joseph, Special:blah urls (other than Special:Search) should not
have been counted as pageviews and since a fix on July they no longer are.
TASK DETAIL
https://phabricator.wikimedia.org/T236895
EMAIL PREFERENCES
https://phabricator.wikimedia.org
Nuria updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T215413
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Miriam, Nuria
Cc: Mholloway, Ottomata, Jheald, Cirdan, MoritzMuehlenhoff, CDanis, akosiaris,
SandraF_WMF
Nuria closed subtask T148843: Remove computational bottlenecks in stats machine
via adding a GPU that can be used to train ML models as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T215413
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
Nuria closed subtask T227905: Public Data Review Needed as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T208567
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic, Nuria
Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE
Nuria renamed this task from "Provision sparql endpoint for SDC. Requirements
from Product Team." to "Provision search endpoint for SDC. Requirements from
Product Team.".
TASK DETAIL
https://phabricator.wikimedia.org/T221921
EMAIL PREFERENCES
https://phabricator.wi
Nuria created this task.
Nuria added projects: Wikidata, Commons, SDC General, Wikidata-Query-Service.
TASK DESCRIPTION
TASK DETAIL
https://phabricator.wikimedia.org/T221921
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Smalyshev
Nuria added a comment.
@abian : this is still not happening on a recurrent schedule yet.
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: abian, leila, Ottomata, Nuria
Nuria added subscribers: Fjalapeno, Nuria.
Nuria added a comment.
pinging @Fjalapeno from your comments the other day I understand Wikidata is
going to use cassandra for these use cases at the end? cc @Addshore
TASK DETAIL
https://phabricator.wikimedia.org/T220823
EMAIL PREFERENCES
Nuria closed subtask T161731: Create reliable change stream for specific wiki
as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T145712
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Nuria
Cc: Lucas_Werkmeister_WMDE, Liuxinyu970226
Nuria closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T161731
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ottomata, Nuria
Cc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, Aklapper,
Nuria added a comment.
You would need a reconstruction that is property-aware, the current one knows
only about pages and revisions. So, with different parameters for what the
reconstruction is doing yes, possible.
TASK DETAIL
https://phabricator.wikimedia.org/T217324
EMAIL PREFERENCES
Nuria added a comment.
@Smalyshev ah, i see what you mean now but I am still of the opinion that the
user should report the query that failed. On our end we can run it and retrieve
the stack trace. Our 500 page could include helpful link to phabricator to
report query that failed.
Maybe I
Nuria added a comment.
@Smalyshev if we configure the error logger to print requests and stack
traces (however deep) we can have alarming on them which would give us a
measure of errors (maybe we already have this). Relying on users to report
stack traces does not seem like it would give
Nuria added a comment.
@Ramsey-WMF Could we possibly get a bit more structured use cases?
Are those documented somewhere besides this ticket so we can see how this use case fits on the big picture? Is there any UI that goes with this case?TASK DETAILhttps://phabricator.wikimedia.org/T215967EMAIL
Nuria moved this task from Incoming to Smart Tools for Better Data on the Analytics board.Nuria triaged this task as "High" priority.
TASK DETAILhttps://phabricator.wikimedia.org/T215616WORKBOARDhttps://phabricator.wikimedia.org/project/board/11/EMAIL PREFERENCEShttps://phabricator.wik
Nuria added a comment.
Not actively working on this now.TASK DETAILhttps://phabricator.wikimedia.org/T189744EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, NuriaCc: Nuria, Jonas, EBernhardson, gerritbot, Lydia_Pintscher, daniel, Aklapper, Smalyshev
Nuria added a comment.
@Samwalton9 we still need to see if urls are url encoded or not and hook publishing to one of the mediawiki events (I think @bmansurov is doing this with @Pchelolo .help?) Once events are flowing and looking OK they can be set to be published to the outside world.TASK
Nuria moved this task from Incoming to Radar on the Analytics board.Nuria raised the priority of this task from "Normal" to "Needs Triage".
TASK DETAILhttps://phabricator.wikimedia.org/T214706WORKBOARDhttps://phabricator.wikimedia.org/project/board/11/EM
Nuria added a comment.
@bmansurov I think you need to consider also couple more things: a list of links can be very lengthy, do we have a limit for how much this field should occupy? Are links url encoded? (we probably want them to be so).TASK DETAILhttps://phabricator.wikimedia.org/T214706EMAIL
Nuria added a comment.
@bmansurov ah I think I understand what you meant, now sorry: if mediawiki cannot generate the diff you are interested on at the time the page is edited you need to consume an event that happens later in the chain, ya, makes sense.TASK DETAILhttps
Nuria added a comment.
Clarifying: ChnageProp consumes EventBus data just like EventStreams consumes EventBus data. So you cannot "use" changeprop rather you will be sending events to EventBus (soon to be called EventGate) and consuming them from elsewhere and in turn exposing them to
Nuria added a comment.
Having missed most of goals this quarter due to our mw woes i think this might need to be moved to next quarter (q4?)TASK DETAILhttps://phabricator.wikimedia.org/T209655EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Ottomata
Nuria removed a project: Analytics-Legal.
TASK DETAILhttps://phabricator.wikimedia.org/T193728EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: ChristianKl, Alsee, Aklapper, Huji, ArthurPSmith, SimonPoole, Scott_WorldUnivAndSch, Micru, lisong, Lofhi
Nuria added subscribers: tstarling, bd808.Nuria added a comment.
Pinging @bd808 and @Fjalapeno and @tstarling per above comment.TASK DETAILhttps://phabricator.wikimedia.org/T209031EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: bd808, tstarling
Nuria added a comment.
Added annotation for this event to wikidata unique devices data on wikistats: http://localhost:5000/dist/#/wikidata.org/reading/unique-devices/normal|line|All|~totalTASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org
Nuria added a comment.
Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master
Nuria reassigned this task from Nuria to mpopov.
TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopov, NuriaCc: Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher
Nuria added a comment.
Misc is no longer in service, all requests have been migrated to 'text'TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev
Nuria closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T191022EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, NuriaCc: Smalyshev, Nuria, gerritbot, JAllemandou, Jonas, Aklapper, Gaboe420, Versusxo, Majestic
Nuria added a parent task: T138207: [Open question] Improve bot identification at scale.
TASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, NuriaCc: Nuria, Aklapper, Lydia_Pintscher, JAllemandou
Nuria reopened this task as "Stalled".
TASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, NuriaCc: Nuria, Aklapper, Lydia_Pintscher, JAllemandou, Addshore, Lahi, Gq86, GoranSMilovanovi
Nuria added a comment.
yes , please, I listed issue on dataset page: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices#Changes_and_Known_Problems_with_Dataset
We do not yet have annotations in wikistats (we will at the end of quarter) but when we do this is a good one
Nuria added a comment.
Bot did not accepted cookies, user agent was changing slightly, in 1000 records when this event is happening 995 are part of event and of those about 200 are unqiue user agents. Still the IP is teh same and the volumes of requests so high that I am wondering how
Nuria added a comment.
F23734550: Screen Shot 2018-07-13 at 12.43.07 PM.png
It coincides with a spike of pageviews from thailand, that seems like a bot accessing teh desktop size, will investigate a bit as to whether this bot was accepting cookies.TASK DETAILhttps://phabricator.wikimedia.org
Nuria added a comment.
Ping @Smalyshev now that you have a reliable stream on the new kafka cluster (that supports time-based consumption) is there any other blockers on your end ?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel
Nuria closed subtask T187296: Increase kafka event retention to 31 as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nur
Nuria closed this task as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T187296EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: mforns, elukey, Ottomata, Aklapper, Nuria, Ladsgroup, Pchelolo, JAllemandou, Smalyshev,
Nuria added a comment.
@Jonas: do you want all requests to www.wikidata.org to be included, correct? Do you care about request to wikidata query service or anything else about the request at hand?TASK DETAILhttps://phabricator.wikimedia.org/T191022EMAIL PREFERENCEShttps
Nuria added a subscriber: JAllemandou.Nuria added a comment.
I think notes look good.
@mforns main point that I missed is that we probably also want to remove geolocation from dataset #1, I see that from your sumup you did.
Remaining item is sanitization of sparql queries and on that I think we
Nuria added a comment.
Nice! Thank you for documenting.TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Liuxinyu970226, Capt_Swing, Ramsey-WMF, SandraF_WMF, Abit, chelsyx, mpopov, debt
Nuria added a comment.
@Smalyshev We like to default to public if possible, the more eyes on the data the more useful it can be.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: mforns, PokestarFan
Nuria added a subscriber: mforns.Nuria added a comment.
@Smalyshev: Take a look at information we keep on pageview hourly, for long time keeping we need to remove PII and we neither store detail timestamps or sessionIds as we want to avoid session reconstruction precisely. So probably if we round
Nuria added a comment.
@Smalyshev Please, 45 minutes with me and @Ottomata would do?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria
Nuria added a comment.
@Smalyshev Ok, we aim to have the cluster handling all prod traffic by end of next quarter, until then it will be mirroing data which i think should be sufficient for you to get started in the wdqs consumer? Correct me if I am wrong.TASK DETAILhttps
Nuria added a comment.
Nice, Can @Smalyshev check whether consuming from these topics as set would work for his purposes?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot
Nuria added a comment.
I got same doing:
/home/otto/kafkacat -Q -b kafka-jumbo1003.eqiad.wmnet -t eqiad.mediawiki.revision-create:0:1512687299 -Xdebug=allTASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences
Nuria added a comment.
@Ottomata Could @Smalyshev do a test on consuming from the new cluster though with teh understanding it is not yet productionized to make sure it fits the use cases?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org
Nuria added a comment.
Are there any docs we can look at with metrics?TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, NuriaCc: Nuria, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF
Nuria added a comment.
@chelsyx That makes sense, thank you.
I was also trying to make a meta point though: since prior work and statistics exist for commons it will be worth documenting ( on meta?) these numbers and why/how they differ with other numbers community might have access to. I know
1 - 100 of 147 matches
Mail list logo