[Wikidata-bugs] [Maniphest] [Edited] T237925: Primary sources tool left without maintainers

2020-04-30 Thread tfmorris
tfmorris updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T237925

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, Matlin, Lucas_Werkmeister_WMDE, Michael, So9q, Hjfocs, 
ChristianKl, Tpt, Pintoch, Lea_Lacroix_WMDE, Aklapper, Jingbiao95, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Kiailandi, QZanden, 
dachary, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ricordisamoa, Tacsipacsi, Sjoerddebruin, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T188715: Compute the Freebase curation ratio per property

2020-04-30 Thread tfmorris
tfmorris added a comment.


  It would seem like the 2018-03-13 spreadsheet should be adequate to call this 
task complete. I would recommend including some qualitative understanding of 
the source of the Freebase data in addition to just pure curation ratio when 
making judgements about how to use which data. Things like MusicBrainz IDs and 
ISFDB IDs went through a heavily QA'd reconciliation process and are going to 
be high quality. Films, and to a lesser extent TV shows, were an area of focus 
for the Freebase team, so will generally be both high quality and relatively 
complete.
  
  Also many of the quality issues with the initial data set didn't have 
anything to do with the Freebase data itself, but the junky "evidence" URLs 
that Google produced after the fact to satisfy the Wikidata call for evidence. 
These tend to be of much, much lower quality than the data itself.
  
  Of course, after so many years, much of the value of the data has been 
squandered, but I bet there are still some areas where it could be used to 
significantly improve Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T188715

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Tpt, tfmorris
Cc: tfmorris, Aklapper, Hjfocs, Jingbiao95, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Kiailandi, QZanden, dachary, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Ricordisamoa, Tacsipacsi, 
Sjoerddebruin, Tpt, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T148150: Primary Source tool shouldn't suggest claims without sources

2020-04-30 Thread tfmorris
tfmorris added a comment.


  The so-called "Freebase" dataset is actually a mix of data from Freebase and 
a bunch of URLs that were pulled from Google web crawls by an intern as 
potential "evidence." They don't have anything to do with the provenance of the 
data that was in Freebase, which was recorded for every item of data that was 
written there. Of course it would be silly to suggest a blacklisted site, but I 
don't believe the intern was provided with a blacklist ahead of time and as the 
blacklist was developed after the fact it hasn't been used to filter what's 
presented to users.

TASK DETAIL
  https://phabricator.wikimedia.org/T148150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, Hjfocs, Glorian_Yapinus, Aklapper, Jingbiao95, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, Kiailandi, QZanden, dachary, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Ricordisamoa, Tacsipacsi, Sjoerddebruin, Tpt, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197588: Agree on a "manifest" format to expose the configuration of Wikibase instances

2020-06-30 Thread tfmorris
tfmorris added a comment.


  If the manifest has to be constructed by hand, it seems like YAML would be a 
better format than JSON. They are equivalent from a structural and 
informational point of view, but YAML is **much** easier to edit without 
creating invalid documents.

TASK DETAIL
  https://phabricator.wikimedia.org/T197588

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, Samantha_Alipio_WMDE, Afkbrb, Abbe98, Theklan, Nikerabbit, 
Salgo60, Aklapper, Gstupp, Lucas_Werkmeister_WMDE, Pintoch, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-07-04 Thread tfmorris
tfmorris added a comment.


  In T240442#5851541 <https://phabricator.wikimedia.org/T240442#5851541>, 
@Addshore wrote:
  
  > In T240442#5834866 <https://phabricator.wikimedia.org/T240442#5834866>, 
@Ladsgroup wrote:
  >
  >> Very broad idea, feel free to discard, I think using industry-wide 
standards for throttling like `token bucket`, `leaky bucket`, `fixed-window 
counter` or `sliding-window counter` might help here.
  >
  > One of the primary questions we need to answer is do we want to keep doing 
this client side self throttling, or switch to something more server side.
  
  I would have thought that it'd be obvious that this can't be done client 
side. They can cheat. They don't know what each other are doing. They don't 
know what other factors are affecting the servers.
  
  As @Ladsgroup hints, this is a basic distributed systems engineering problem 
with known answers. In addition to rate limiting at ingress, it may be helpful 
to add backpressure signals between the various internal servers as well as add 
jitter to the Retry-After signals sent to clients.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, valhallasw, Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, 
Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T329093: sitelink encoding issues in the Wikibase REST API

2023-02-08 Thread tfmorris
tfmorris added a comment.


  I vote for full URLs. Also, HTTPS URLs should probably be used throughout in 
preference to HTTP URLs to save naive clients from the extra latency of a 
redirect.

TASK DETAIL
  https://phabricator.wikimedia.org/T329093

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, connorshea, Aklapper, Lydia_Pintscher, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T329093: sitelink encoding issues in the Wikibase REST API

2023-03-14 Thread tfmorris
tfmorris added a comment.


  How does one discover what the resolution was? (Apologies if this should be 
obvious, but I'm used to bug trackers which link the commits back to the issue.)

TASK DETAIL
  https://phabricator.wikimedia.org/T329093

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher, tfmorris
Cc: tfmorris, connorshea, Aklapper, Lydia_Pintscher, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337021: [Analytics] Find out size of term subgraph

2023-07-20 Thread tfmorris
tfmorris added a comment.


  Is triple count the only important parameter? It seems likely that the 
descriptions could be larger, on average, than labels.
  
  It seems odd that there are more descriptions (19% of total) than labels 
(5%), although that agrees with what the previous study found 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#Description>.
 The strong spike at 58-61 descriptions per item tells me that some bot 
probably machine generated templated descriptions for a large number of 
languages. The fact that there are more Dutch descriptions than any other 
language 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#Language_distribution_of_descriptions>
 also says "machine generated" to me.
  
  Storing machine generated templated descriptions in the graph seems wasteful. 
I've observed anecdotally when working with person entities that a large number 
of them have pro-forma descriptions of   ( 
- ). These obviously don't need to be stored in the graph because 
they're just reiterating / duplicating existing information. If Wikidata 
search/autocomplete were made smarter, these could be generated on the fly.

TASK DETAIL
  https://phabricator.wikimedia.org/T337021

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, tfmorris
Cc: tfmorris, Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337021: [Analytics] Find out size of term subgraph

2023-07-20 Thread tfmorris
tfmorris added a comment.


  I have a theory as to where a big chunk of the machine generated descriptions 
are from. They are the phrase "Wikimedia category" in hundreds of languages as 
a textual transcription of the triple `instanceOf Q4167836`. For example, 
Catégorie:Naissance à Seri Menanti <https://www.wikidata.org/wiki/Q86602036> 
has a single label in French and the P31 
<https://phabricator.wikimedia.org/P31> instanceOf claim which together occupy 
802 bytes. Then two bots (Mr.Ibrahembot 
<https://www.wikidata.org/wiki/User:Mr.Ibrahembot> and Emijrpbot 
<https://www.wikidata.org/wiki/User:Emijrpbot>) came along and added another 
11.5K (!) of static text (not even anything templated) in 129 languages, none 
of which have labels for the category.
  
  There are 5.1M category items, 1.4M disambiguation page items, and more than 
7M internal items of this type in total. The bots haven't fully populated all 
the descriptions yet, but this could amount to over 0.6B triples and 58 GB of 
wasted storage just for category items at the 130 language level. Imagine the 
waste as more languages are included and more items are added.
  
  This is a huge waste of resources caused by humans attempting to work around 
a single product deficiency. It's only going to get more expensive over time.
  
  p.s. These bots apparently aren't limited to internal Wikipedia items. Here's 
a user <https://quickstatements.toolforge.org/#/batches/YoaR> who's adding 
Asturian boilerplate descriptions not only to Wikipedia categories 
<https://quickstatements.toolforge.org/#/batch/208440>, but also U. S. patents 
<https://quickstatements.toolforge.org/#/batch/209034>. This flood of useless 
data isn't going to be sustainable.

TASK DETAIL
  https://phabricator.wikimedia.org/T337021

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, tfmorris
Cc: tfmorris, Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337021: [Analytics] Find out size of term subgraph

2023-07-21 Thread tfmorris
tfmorris added a comment.


  @Manuel when you write:
  
  > A new feature that would solve this problem is already planned, but it does 
not exist yet (see T303677 <https://phabricator.wikimedia.org/T303677>).
  
  Thanks for the pointer! What does "planned" mean in this context? How do I 
find the schedule and/or priority of the task? My naive reading of the ticket 
gives the impression that it's been stalled without action for over a year.

TASK DETAIL
  https://phabricator.wikimedia.org/T337021

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, tfmorris
Cc: tfmorris, Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-07-21 Thread tfmorris
tfmorris added a comment.


  I'm surprised that this hasn't received any attention in 15 months. As an 
update to @Nikki 's numbers <https://phabricator.wikimedia.org/T303677#7789434> 
there are now on the order of 2.5 **BILLION** of these bot generated 
descriptions. The top 5 alone represent over 2 billion triples. That's a huge 
waste of resources!
  
  | Q# | Entity Type   | Descriptions (Billions) |
  | Q13442814  | scholarly article | 1.32|
  | Q4167836  | Wikimedia category| 0.60|
  | Q4167410  | Wikimedia disambiguation page | 0.11|
  | Q11266439  | Wikimedia template| 0.09|
  | Q101352  | family name   | 0.06|
  |
  
  In addition to the usability and resource issues, there's also a substantial 
language equity issue associated with the lack of this functionality. The 
language with the largest number of descriptions is Dutch simply because 
there's a Dutch speaking bot operator who has vigorously added many, many 
machine generated descriptions 
<https://www.wikidata.org/wiki/User:Edoderoobot/Set-nl-description>. On the 
flip side, languages without the privilege of bot operators supporting them go 
wanting and have no way to disambiguate the terms that autocomplete / search 
offers them. Of course, if someone were to start adding machine generated 
descriptions for all those hundreds of languages, the situation would be 
completely untenable from a Blazegraph point of view.
  
  As an alternative to a textual description, I'll offer the suggestion to 
consider building an autocomplete widget 
<https://developers.google.com/freebase/v1/search-widget> which looks more like 
this: F37145761: Screen Shot 2023-07-21 at 2.22.16 PM.png 
<https://phabricator.wikimedia.org/F37145761> That's how Freebase Suggest 
<https://developers.google.com/freebase/v1/search-widget> did it back in 2008. 
Heck, you could even steal the code 
<https://github.com/googlearchive/freebase-suggest>. One non-obvious aspect of 
their implementation was that they used metaschema annotations of types as 
being "Notable" or interesting enough to show the user. Similarly the 
properties which were displayed varied by entity type and were controlled by 
metaschema notations, so you might have birth date and place for a person, but 
containing/parent entity for something like a town or species. Of course, even 
just a simple list of the P31 <https://phabricator.wikimedia.org/P31>'s would 
be better than the current situation.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, Lectrician1, 
waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, 
Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T262550: Toolforge returns HTTP 502 error

2020-09-10 Thread tfmorris
tfmorris added a comment.


  https://isa.toolforge.org/ and https://wikishootme.toolforge.org/ were also 
down about the same time (11:03 Eastern US).

TASK DETAIL
  https://phabricator.wikimedia.org/T262550

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, aborrero, RhinosF1, Bugreporter, Pintoch, Aklapper, 
Nintendofan885, Akuckartz, darthmon_wmde, Nandana, skpuneethumar, Zylc, Bstorm, 
1978Gage2001, Lahi, Gq86, GoranSMilovanovic, DSquirrelGM, Chicocvenancio, 
QZanden, Tbscho, Freddy2001, LawExplorer, JJMC89, _jensen, rosalieper, 
Scott_WUaS, Luke081515, Wikidata-bugs, Jitrixis, aude, Gryllida, jayvdb, scfc, 
coren, Mbch331, Krenair
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T115911: TypeError: wikibase.dataTypeStore is undefined (PrimarySources)

2016-02-24 Thread tfmorris
tfmorris added a comment.


  I think these are likely two different bugs. Has anyone looked at either of 
them in the last 4 months?
  
  Here's are some other examples of aude's bug:
  
https://www.wikidata.org/wiki/Q5260247?debug=1
https://www.wikidata.org/wiki/Q4636?debug=1
  
  It's a data pattern sensitive bug triggered by URLs containing patterns which 
look like https://phabricator.wikimedia.org/P999 or S999.  I've created a bug 
report at the project site: https://github.com/google/primarysources/issues/75

TASK DETAIL
  https://phabricator.wikimedia.org/T115911

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, thiemowmde, hoo, daniel, Lydia_Pintscher, aude, JanZerebecki, 
Aklapper, Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126510: [Story] Allow adding additional languages in the terms box

2016-04-18 Thread tfmorris
tfmorris added a comment.


  It seems bizarre that the utility of this is debated. The solution suggested 
by Bene sounds simple, straightforward, and useful.

TASK DETAIL
  https://phabricator.wikimedia.org/T126510

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, ChristianKl, Bene, Nikki, Izno, Lydia_Pintscher, Mbch331, 
Aklapper, matej_suchanek, StudiesWorld, D3r1ck01, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs