[Wikidata] EveryBookItsReader problem with banned books in Wikidata

2023-03-24 Thread Nuria Ferran Ferrer
Hi! We are working in the international campaign of this April "Every Book its Reader" https://meta.wikimedia.org/wiki/EveryBooksItsReader_2023 and we wanted to create or modify articles about banned books. We realised that from all the different classes that are related to banned books, we

[Wikidata] Re: Fwd: Upcoming Wikidata & Wikibase office hours on Wednesday, November 9th 2022 at 17:00 UTC (18:00 Berlin) in the Wikidata Telegram group

2022-11-07 Thread Nuria Ferran Ferrer
Dear Lea, do you have a calendar of the office hours for this month? We are thinking of presenting a proposal for the WMF Research Grant and it will help us to present our project and receive feedback. Thanks! Núria Núria Ferran Ferrer Professora de la

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-07-31 Thread Nuria
Nuria added a comment. And, forgot to say, THIS IS SUPER USEFUL, thanks! TASK DETAIL https://phabricator.wikimedia.org/T249654 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, Nuria Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-07-31 Thread Nuria
Nuria added a comment. > No transclusion: in line with current estimates, 41% of articles had no templates that support transclusion whatsoever This is "overall articles for all projects", correct? > Overlapping with the other categories, 54% of articles additional

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-07-31 Thread Nuria
Nuria added a subscriber: calbon. Nuria added a comment. cc @calbon so he is aware of this reserach TASK DETAIL https://phabricator.wikimedia.org/T249654 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac, Nuria Cc: calbon, Addshore

[Wikidata-bugs] [Maniphest] [Commented On] T154601: Grafana: "wikidata-datamodel-terms" doesn't update anymore

2020-07-01 Thread Nuria
Nuria added a comment. Then I suggest something like: https://wmcs-edits.wmflabs.org/#wmcs-edits this dashboard is public and powered by data that is extracted from hive. TASK DETAIL https://phabricator.wikimedia.org/T154601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T154601: Grafana: "wikidata-datamodel-terms" doesn't update anymore

2020-06-30 Thread Nuria
Nuria added a comment. @Addshore you can query wikidata_entity from the SQL lab tb in superset. Superset is not public however so to access you need public authentication https://superset.wikimedia.org TASK DETAIL https://phabricator.wikimedia.org/T154601 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-06-03 Thread Nuria
Nuria added a comment. Given that retention is not on puppet is this a setting that is communicated to a new node when it joins the cluster by the leader of the partition or similar? TASK DETAIL https://phabricator.wikimedia.org/T253753 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Unblock] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-06-03 Thread Nuria
Nuria closed subtask T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: revi, Mholloway

[Wikidata-bugs] [Maniphest] [Closed] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-06-03 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T253753 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ottomata, Nuria Cc: Nuria, JAllemandou, Ottomata, dcausse, Aklapper, CBogen, 4748kitoko, dar

[Wikidata-bugs] [Maniphest] [Declined] T59379: Wikistats for Wikidata lists several bots as normal users

2020-05-21 Thread Nuria
Nuria closed this task as "Declined". Restricted Application removed a subscriber: Liuxinyu970226. TASK DETAIL https://phabricator.wikimedia.org/T59379 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Addshore, Aklapper, QChri

[Wikidata-bugs] [Maniphest] [Unblock] T169798: Create UDFs for analyzing SPARQL queries

2020-04-08 Thread Nuria
Nuria closed subtask T164020: Use spark to split webrequest on tags as Declined. TASK DETAIL https://phabricator.wikimedia.org/T169798 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: PokestarFan, gerritbot, AndrewSu, Nuria, Aklapper

[Wikidata-bugs] [Maniphest] [Closed] T236895: ArticlePlaceholder dashboard stopped tracking page views

2020-03-16 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T236895 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup, Nuria Cc: mforns, Milimetric, Ladsgroup, Nuria, JAllemandou, elukey, Addshore,

[Wikidata-bugs] [Maniphest] [Commented On] T247058: Deployment strategy and hardware requirement for new Flink based WDQS updater

2020-03-09 Thread Nuria
Nuria added a comment. I think it will be very helpful to have a design document for this service so we are all in the same page of what the flink install would do (as there are other projects currently evaluating flink as well). Can we get a google doc that goes over the design proposed

[Wikidata-bugs] [Maniphest] [Closed] T209655: Copy Wikidata dumps to HDFS + parquet

2020-02-27 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, Nuria Cc: Isaac, Groceryheist, MGerlach, WMDE-leszek, abian, leila, Ottom

[Wikidata-bugs] [Maniphest] [Unblock] T209655: Copy Wikidata dumps to HDFS + parquet

2020-02-27 Thread Nuria
Nuria closed subtask T243832: Fix hdfs-rsync`prune-empty-dirs` feature as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, Nuria Cc: Isaac, Groceryheist, MGerlach, WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T174981: Add pageviews total counts to WDQS

2020-02-26 Thread Nuria
Nuria added a comment. I think before talking about bytes you need a use case, what is the use case here? As we mentioned earlier the GLAM folks care about human pageviews (real eye balls) on media files and pages, both cases are (and will be better) satisfied by existing analytics APIs

[Wikidata-bugs] [Maniphest] [Closed] T239565: Create reportupdater reports that execute SDC requests

2020-01-13 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Milimetric, Nuria Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, M

[Wikidata-bugs] [Maniphest] [Unblock] T238878: Data about how many file pages on Commons contain at least one structured data element

2020-01-13 Thread Nuria
Nuria closed subtask T239565: Create reportupdater reports that execute SDC requests as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Milimetric, Cparle, nettrom_WMF

[Wikidata-bugs] [Maniphest] [Commented On] T174981: Add pageviews total counts to WDQS

2020-01-05 Thread Nuria
Nuria added a comment. Please see: https://stats.wikimedia.org/v2/#/wikidata.org/reading/total-page-views/normal|bar|2-year|agent~user*spider|monthly TASK DETAIL https://phabricator.wikimedia.org/T174981 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T174981: Add pageviews total counts to WDQS

2020-01-05 Thread Nuria
Nuria added a comment. @christophbraun I think it would help to start a ticket describing your use case in detail. Have in mind that pageviews (defined as content consumed by humans) do not really "apply" to wikidata items. The bulk of the activity on the site around http requests

[Wikidata-bugs] [Maniphest] [Commented On] T174981: Add pageviews total counts to WDQS

2020-01-05 Thread Nuria
Nuria added a comment. Updating WDQS (a relational query engine) with metadata about pageviews (per definition a timeseries) seems not the best idea from a data modeling standpoint. The GLAM use case is much better served by an API that returns pageviews across time, I would put

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-19 Thread Nuria
Nuria added a comment. > I'd like a clear explanation as the multiple threads have become complicated to resolve objectively. Indeed. @Milimetric we just want to have data for commons, for wikidata there are plenty metrics available that also measure usage from this table, see: ht

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-18 Thread Nuria
Nuria added a comment. @Addshore @matthiasmullie Can we document on https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage what the "M" in wbc_entity_usage stands for? TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wik

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-11 Thread Nuria
Nuria added a comment. > We would still like productionized reports for (3). If that is still possible, I would love to discuss it more :) Please coordinate with #product-analytics <https://phabricator.wikimedia.org/tag/product-analytics/> on those. I found yet anoth

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-10 Thread Nuria
Nuria added a comment. @Abit: Sorry it was not clear. This below is the request you send to analytics couple weeks ago via e-mail, as I mentioned then we rather work on requests via phab tickets that via e-mail. " From: Amanda Bittaker Date: Tue, Nov 19, 2019 at 1:30 AM Su

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-10 Thread Nuria
Nuria added a comment. > So, it seems we have 2 completely separate definitions of "structured data": Well, the intent of these numbers is not to measure "structured data usage" nor to define that concept. The intent is to measure the impact of the structure data

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-10 Thread Nuria
Nuria added a comment. I see 7.9 million wikidata items on that table and 1936 mediawinfo items. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Milimetric, Cparle, nettrom_WMF

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-10 Thread Nuria
Nuria added a comment. > but by and large, it's currently wikidata entities being pulled in via Lua It is exclusively wikidata items at this time correct? TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-09 Thread Nuria
Nuria added a comment. Select for @mpopov to look at select count(distinct eu_page_id) from mediawiki_page as P JOIN mediawiki_wbc_entity_usage as W ON (W.eu_page_id = P.page_id and W.wiki_db=P.wiki_db and W.snapshot=P.snapshot) where W.wiki_db="commonswiki" and W.snapsho

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-09 Thread Nuria
Nuria added a comment. Please see my comment on https://phabricator.wikimedia.org/T238878#5726624 Seems like the 7.9 million items are from contributions of wikidata alone. TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-12-09 Thread Nuria
Nuria added a comment. After looking at this for a bit with @Ladsgroup and @mpopov (cc @Abit ) A commons page can have data from wikidata and the wikibase instance on commons (called WikibaseMediaInfo, MediaInfor for short). wbc_entity_usage will not have any data from structured

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-09 Thread Nuria
Nuria added a comment. Adding here public pdf with structure data on commons grant proposal: https://upload.wikimedia.org/wikipedia/foundation/f/f0/Public_Copy_-_Structured_Data_on_Commons_Proposal.pdf TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-09 Thread Nuria
Nuria added a comment. @Abit: The queries that report the 7.8 million include , per @matthiasmullie comment both Wikidata Items and Mediainfo items. We can help calculate the percentage of each but from numbers thus far it seems that of those 7.8M items more than half are Wikidata items

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-06 Thread Nuria
Nuria added a comment. @abit: Numbers about SDC will be reported in the platform evolution slides. TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Milimetric, Nuria Cc: Abit, Ramsey-WMF

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-05 Thread Nuria
Nuria added a comment. It looks like we are going to have to report this number on the tunning session so taking back my comment above, let's proceed. TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-05 Thread Nuria
Nuria added a comment. Let's pause this work as it turns out as there is a parallel effort happening , @Abit to create a ticket for ongoing work TASK DETAIL https://phabricator.wikimedia.org/T239565 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T239565: Create reportupdater reports that execute SDC requests

2019-12-04 Thread Nuria
Nuria added a comment. Some alternatives: superset can source data from other places than druid and we have couple dashboards on top of some tables in staging. This might not be the best option as reportupdater produces tsvs rather than inserting data into staging again. Druid is good

[Wikidata-bugs] [Maniphest] [Created] T239565: Create reportupdater reports that execute SDC requests

2019-12-01 Thread Nuria
Nuria created this task. Nuria added projects: Analytics, Wikidata, SDC General, Product-Analytics, Analytics-Kanban. TASK DESCRIPTION These queries: https://phabricator.wikimedia.org/T238878#5692516 should be productionized as reportupdater reports either from data from the replicas or from

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-28 Thread Nuria
Nuria added a comment. Can someone explain why the content table is as large as the revision table? I though these tables were not used much other than commons but content table is big in dewiki for example TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Updated] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-27 Thread Nuria
Nuria added a subtask: T239127: Import slots/slots_roles and wikibase.wbc_entity_usage through scoop . TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Cparle, nettrom_WMF, Ladsgroup

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-27 Thread Nuria
Nuria added a comment. We will be scooping the slot tables to the cluster so these queries can run in parallel. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Cparle, nettrom_WMF

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-22 Thread Nuria
Nuria added a comment. So, per my comment above, I think the number of items is actually smaller than the one @Addshore has computed but more wise folks can correct me if I am wrong. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-22 Thread Nuria
Nuria added a comment. @Addshore : disclaimer: I know next to nothing about this but how are you taking into account that the revision is the last one for the page? That is, a page might have had a structured data item in a prior revision and from its most current revision that structured

[Wikidata-bugs] [Maniphest] [Edited] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko, darthmon_wmde, DannyS712, Nandana

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria added a subscriber: Addshore. Nuria added a comment. I think @Addshore had some information to add here. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Addshore, kzimmerman

[Wikidata-bugs] [Maniphest] [Updated] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria added a comment. Per T220525 <https://phabricator.wikimedia.org/T220525> it looks like none of the xml dump files that provide content for analytics contain any data about structure data in commons files. Proposed changes (that I am not sure got implemented) to change the

[Wikidata-bugs] [Maniphest] [Edited] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK

[Wikidata-bugs] [Maniphest] [Created] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria created this task. Nuria added projects: Analytics, SDC General, Product-Analytics. Restricted Application added a project: Wikidata. TASK DESCRIPTION TASK DETAIL https://phabricator.wikimedia.org/T238878 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T238878: Data about how many file pages on Commons contain at least one structured data element

2019-11-21 Thread Nuria
Nuria added a comment. The work done by @mpopov The wbc_entity_usage table is supposed to hold info on Wikidata usage for the pages For example, here's a random file I added some structured data to a few days ago: https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i

[Wikidata-bugs] [Maniphest] [Updated] T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps

2019-11-21 Thread Nuria
Nuria added a comment. Restricted Application added a project: Structured-Data-Backlog. I see this ticket is resolved but the dumps on commons have version version="0.10" since from this ticket i gather that the dumps that contain those slots are version=11 , are those being produ

[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2019-10-30 Thread Nuria
Nuria added a comment. yes, you can use https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/GetHostPropertiesUDF.java to get the "project/family" TASK DETAIL https://phabricator.wikimedia.org/T236

[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2019-10-30 Thread Nuria
Nuria added a comment. So this query needs to remove the is_pageview=true line: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/WikidataArticlePlaceholderMetrics.scala#L90 TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T236895: ArticlePlaceholder dashboard stopped tracking page views

2019-10-30 Thread Nuria
Nuria added a comment. Ya, =1 to joseph, Special:blah urls (other than Special:Search) should not have been counted as pageviews and since a fix on July they no longer are. TASK DETAIL https://phabricator.wikimedia.org/T236895 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Edited] T215413: Image Classification Working Group

2019-10-21 Thread Nuria
Nuria updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T215413 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Miriam, Nuria Cc: Mholloway, Ottomata, Jheald, Cirdan, MoritzMuehlenhoff, CDanis, akosiaris, SandraF_WMF

[Wikidata-bugs] [Maniphest] [Unblock] T215413: Image Classification Working Group

2019-10-10 Thread Nuria
Nuria closed subtask T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T215413 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Unblock] T208567: Count Wikidata page views per page type

2019-07-23 Thread Nuria
Nuria closed subtask T227905: Public Data Review Needed as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, Nuria Cc: GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE

[Wikidata-bugs] [Maniphest] [Retitled] T221921: Provision search endpoint for SDC. Requirements from Product Team.

2019-04-29 Thread Nuria
Nuria renamed this task from "Provision sparql endpoint for SDC. Requirements from Product Team." to "Provision search endpoint for SDC. Requirements from Product Team.". TASK DETAIL https://phabricator.wikimedia.org/T221921 EMAIL PREFERENCES https://phabricator.wi

[Wikidata-bugs] [Maniphest] [Created] T221921: Provision sparql endpoint for SDC. Requirements from Product Team.

2019-04-25 Thread Nuria
Nuria created this task. Nuria added projects: Wikidata, Commons, SDC General, Wikidata-Query-Service. TASK DESCRIPTION TASK DETAIL https://phabricator.wikimedia.org/T221921 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Smalyshev

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2019-04-24 Thread Nuria
Nuria added a comment. @abian : this is still not happening on a recurrent schedule yet. TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: abian, leila, Ottomata, Nuria

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T220823: Use ElasticSearch for bulk Wikidata entity term lookup

2019-04-24 Thread Nuria
Nuria added subscribers: Fjalapeno, Nuria. Nuria added a comment. pinging @Fjalapeno from your comments the other day I understand Wikidata is going to use cassandra for these use cases at the end? cc @Addshore TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Unblock] T145712: Statement counts from pageprops do not match actual ones ( wikibase:statements and wikibase:sitelinks )

2019-04-24 Thread Nuria
Nuria closed subtask T161731: Create reliable change stream for specific wiki as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T145712 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nuria Cc: Lucas_Werkmeister_WMDE, Liuxinyu970226

[Wikidata-bugs] [Maniphest] [Closed] T161731: Create reliable change stream for specific wiki

2019-04-24 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T161731 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ottomata, Nuria Cc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, Aklapper,

[Wikidata-bugs] [Maniphest] [Commented On] T217324: Have a more fine-grained history for property values on item pages

2019-03-01 Thread Nuria
Nuria added a comment. You would need a reconstruction that is property-aware, the current one knows only about pages and revisions. So, with different parameters for what the reconstruction is doing yes, possible. TASK DETAIL https://phabricator.wikimedia.org/T217324 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T216701: Wikidata Query Service should have a proper high level error handler

2019-02-22 Thread Nuria
Nuria added a comment. @Smalyshev ah, i see what you mean now but I am still of the opinion that the user should report the query that failed. On our end we can run it and retrieve the stack trace. Our 500 page could include helpful link to phabricator to report query that failed. Maybe I

[Wikidata-bugs] [Maniphest] [Commented On] T216701: Wikidata Query Service should have a proper high level error handler

2019-02-21 Thread Nuria
Nuria added a comment. @Smalyshev if we configure the error logger to print requests and stack traces (however deep) we can have alarming on them which would give us a measure of errors (maybe we already have this). Relying on users to report stack traces does not seem like it would give

[Wikidata-bugs] [Maniphest] [Commented On] T215967: Add keyword for filtering based on captions in specific language

2019-02-14 Thread Nuria
Nuria added a comment. @Ramsey-WMF Could we possibly get a bit more structured use cases? Are those documented somewhere besides this ticket so we can see how this use case fits on the big picture? Is there any UI that goes with this case?TASK DETAILhttps://phabricator.wikimedia.org/T215967EMAIL

[Wikidata-bugs] [Maniphest] [Triaged] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-11 Thread Nuria
Nuria moved this task from Incoming to Smart Tools for Better Data on the Analytics board.Nuria triaged this task as "High" priority. TASK DETAILhttps://phabricator.wikimedia.org/T215616WORKBOARDhttps://phabricator.wikimedia.org/project/board/11/EMAIL PREFERENCEShttps://phabricator.wik

[Wikidata-bugs] [Maniphest] [Commented On] T189744: Add hints parameter to wbsearchentities

2019-02-05 Thread Nuria
Nuria added a comment. Not actively working on this now.TASK DETAILhttps://phabricator.wikimedia.org/T189744EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, NuriaCc: Nuria, Jonas, EBernhardson, gerritbot, Lydia_Pintscher, daniel, Aklapper, Smalyshev

[Wikidata-bugs] [Maniphest] [Commented On] T214706: How to surface link changes as a stream?

2019-02-05 Thread Nuria
Nuria added a comment. @Samwalton9 we still need to see if urls are url encoded or not and hook publishing to one of the mediawiki events (I think @bmansurov is doing this with @Pchelolo .help?) Once events are flowing and looking OK they can be set to be published to the outside world.TASK

[Wikidata-bugs] [Maniphest] [Raised Priority] T214706: How to surface link changes as a stream?

2019-01-25 Thread Nuria
Nuria moved this task from Incoming to Radar on the Analytics board.Nuria raised the priority of this task from "Normal" to "Needs Triage". TASK DETAILhttps://phabricator.wikimedia.org/T214706WORKBOARDhttps://phabricator.wikimedia.org/project/board/11/EM

[Wikidata-bugs] [Maniphest] [Commented On] T214706: How to surface link changes as a stream?

2019-01-25 Thread Nuria
Nuria added a comment. @bmansurov I think you need to consider also couple more things: a list of links can be very lengthy, do we have a limit for how much this field should occupy? Are links url encoded? (we probably want them to be so).TASK DETAILhttps://phabricator.wikimedia.org/T214706EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T214706: How to surface link changes as a stream?

2019-01-25 Thread Nuria
Nuria added a comment. @bmansurov ah I think I understand what you meant, now sorry: if mediawiki cannot generate the diff you are interested on at the time the page is edited you need to consume an event that happens later in the chain, ya, makes sense.TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Commented On] T214706: How to surface link changes as a stream?

2019-01-25 Thread Nuria
Nuria added a comment. Clarifying: ChnageProp consumes EventBus data just like EventStreams consumes EventBus data. So you cannot "use" changeprop rather you will be sending events to EventBus (soon to be called EventGate) and consuming them from elsewhere and in turn exposing them to

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFs

2018-12-06 Thread Nuria
Nuria added a comment. Having missed most of goals this quarter due to our mw woes i think this might need to be moved to next quarter (q4?)TASK DETAILhttps://phabricator.wikimedia.org/T209655EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Ottomata

[Wikidata-bugs] [Maniphest] [Updated] T193728: Address concerns about perceived legal uncertainty of Wikidata

2018-11-25 Thread Nuria
Nuria removed a project: Analytics-Legal. TASK DETAILhttps://phabricator.wikimedia.org/T193728EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: ChristianKl, Alsee, Aklapper, Huji, ArthurPSmith, SimonPoole, Scott_WorldUnivAndSch, Micru, lisong, Lofhi

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T209031: Not able to scoop comment table in labs for mediawiki reconstruction process

2018-11-12 Thread Nuria
Nuria added subscribers: tstarling, bd808.Nuria added a comment. Pinging @bd808 and @Fjalapeno and @tstarling per above comment.TASK DETAILhttps://phabricator.wikimedia.org/T209031EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: bd808, tstarling

[Wikidata-bugs] [Maniphest] [Commented On] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-10-01 Thread Nuria
Nuria added a comment. Added annotation for this event to wikidata unique devices data on wikistats: http://localhost:5000/dist/#/wikidata.org/reading/unique-devices/normal|line|All|~totalTASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating

2018-09-24 Thread Nuria
Nuria added a comment. Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master

[Wikidata-bugs] [Maniphest] [Reassigned] T204415: Query stats dashboard not updating

2018-09-24 Thread Nuria
Nuria reassigned this task from Nuria to mpopov. TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mpopov, NuriaCc: Ottomata, elukey, Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev, Lydia_Pintscher

[Wikidata-bugs] [Maniphest] [Commented On] T204415: Query stats dashboard not updating

2018-09-24 Thread Nuria
Nuria added a comment. Misc is no longer in service, all requests have been migrated to 'text'TASK DETAILhttps://phabricator.wikimedia.org/T204415EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, mpopov, chelsyx, Aklapper, Addshore, Smalyshev

[Wikidata-bugs] [Maniphest] [Closed] T191022: Add Wikidata website extract oozie job

2018-08-22 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T191022EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jonas, NuriaCc: Smalyshev, Nuria, gerritbot, JAllemandou, Jonas, Aklapper, Gaboe420, Versusxo, Majestic

[Wikidata-bugs] [Maniphest] [Updated] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-07-16 Thread Nuria
Nuria added a parent task: T138207: [Open question] Improve bot identification at scale. TASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, NuriaCc: Nuria, Aklapper, Lydia_Pintscher, JAllemandou

[Wikidata-bugs] [Maniphest] [Reopened] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-07-16 Thread Nuria
Nuria reopened this task as "Stalled". TASK DETAILhttps://phabricator.wikimedia.org/T199517EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Addshore, NuriaCc: Nuria, Aklapper, Lydia_Pintscher, JAllemandou, Addshore, Lahi, Gq86, GoranSMilovanovi

[Wikidata-bugs] [Maniphest] [Commented On] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-07-16 Thread Nuria
Nuria added a comment. yes , please, I listed issue on dataset page: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices#Changes_and_Known_Problems_with_Dataset We do not yet have annotations in wikistats (we will at the end of quarter) but when we do this is a good one

[Wikidata-bugs] [Maniphest] [Commented On] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-07-13 Thread Nuria
Nuria added a comment. Bot did not accepted cookies, user agent was changing slightly, in 1000 records when this event is happening 995 are part of event and of those about 200 are unqiue user agents. Still the IP is teh same and the volumes of requests so high that I am wondering how

[Wikidata-bugs] [Maniphest] [Commented On] T199517: Investigate June Unique devices increase of 170% for wikidata

2018-07-13 Thread Nuria
Nuria added a comment. F23734550: Screen Shot 2018-07-13 at 12.43.07 PM.png It coincides with a spike of pageviews from thailand, that seems like a bot accessing teh desktop size, will investigate a bit as to whether this bot was accepting cookies.TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2018-06-25 Thread Nuria
Nuria added a comment. Ping @Smalyshev now that you have a reliable stream on the new kafka cluster (that supports time-based consumption) is there any other blockers on your end ?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Unblock] T161731: Create reliable change stream for specific wiki

2018-06-25 Thread Nuria
Nuria closed subtask T187296: Increase kafka event retention to 31 as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nur

[Wikidata-bugs] [Maniphest] [Closed] T187296: Increase kafka event retention to 31

2018-06-25 Thread Nuria
Nuria closed this task as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T187296EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: mforns, elukey, Ottomata, Aklapper, Nuria, Ladsgroup, Pchelolo, JAllemandou, Smalyshev,

[Wikidata-bugs] [Maniphest] [Commented On] T191022: Add Wikidata website extract oozie job

2018-03-29 Thread Nuria
Nuria added a comment. @Jonas: do you want all requests to www.wikidata.org to be included, correct? Do you care about request to wikidata query service or anything else about the request at hand?TASK DETAILhttps://phabricator.wikimedia.org/T191022EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-05 Thread Nuria
Nuria added a subscriber: JAllemandou.Nuria added a comment. I think notes look good. @mforns main point that I missed is that we probably also want to remove geolocation from dataset #1, I see that from your sumup you did. Remaining item is sanitization of sparql queries and on that I think we

[Wikidata-bugs] [Maniphest] [Commented On] T174519: [epic] SDoC: Determine baseline for metrics

2017-12-21 Thread Nuria
Nuria added a comment. Nice! Thank you for documenting.TASK DETAILhttps://phabricator.wikimedia.org/T174519EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Liuxinyu970226, Capt_Swing, Ramsey-WMF, SandraF_WMF, Abit, chelsyx, mpopov, debt

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-19 Thread Nuria
Nuria added a comment. @Smalyshev We like to default to public if possible, the more eyes on the data the more useful it can be.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: mforns, PokestarFan

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-18 Thread Nuria
Nuria added a subscriber: mforns.Nuria added a comment. @Smalyshev: Take a look at information we keep on pageview hourly, for long time keeping we need to remove PII and we neither store detail timestamps or sessionIds as we want to avoid session reconstruction precisely. So probably if we round

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2017-12-13 Thread Nuria
Nuria added a comment. @Smalyshev Please, 45 minutes with me and @Ottomata would do?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2017-12-13 Thread Nuria
Nuria added a comment. @Smalyshev Ok, we aim to have the cluster handling all prod traffic by end of next quarter, until then it will be mirroing data which i think should be sufficient for you to get started in the wdqs consumer? Correct me if I am wrong.TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2017-12-08 Thread Nuria
Nuria added a comment. Nice, Can @Smalyshev check whether consuming from these topics as set would work for his purposes?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2017-12-07 Thread Nuria
Nuria added a comment. I got same doing: /home/otto/kafkacat -Q -b kafka-jumbo1003.eqiad.wmnet -t eqiad.mediawiki.revision-create:0:1512687299 -Xdebug=allTASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T161731: Create reliable change stream for specific wiki

2017-12-04 Thread Nuria
Nuria added a comment. @Ottomata Could @Smalyshev do a test on consuming from the new cluster though with teh understanding it is not yet productionized to make sure it fits the use cases?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org

  1   2   >