[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread So9q
So9q added a comment.


  In T281854#7062631 , 
@Fnielsen wrote:
  
  > "percentage, number of scientific papers that are connected to 
non-scientific paper items in WD"
  > Quite a lot of scholarly papers are connected to a journal item, to one or 
more topic items, to a language item, some to a notable author, that is in 
Wikipedia (so we need item in Wikidata). Currently, according to the statistics 
on Scholia there are 14.211.431 topic links. Works may have multiple links so 
perhaps only <10.000.000 works have one or more topics, - we should target for 
most works to have a topic, so I suspect this would grow.
  
  Since I wrote ItemSubjector 
 the number of 
links to topics via (main subject) are increasing by about ½ mio. per week. 
Because of a time out we don't know how many articles are currently missing at 
least one "main subject", but according to the data in QLever it was 27M out of 
37M  a few months ago when 
they updated last.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, So9q
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping 
data from Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T293628: Get baseline 
measurements/expectations for splitting various subgraphs from Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-27 Thread Gehel
Gehel closed this task as "Resolved".
Gehel added a comment.


  I'm closing this as the statistics have been collected and published. The 
larger discussion on should probably continue on this talk page: 
https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Gehel
Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  Here is the analysis done on scholarly articles in Wikidata and WDQS queries 
related to them: 
https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, 
Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-08 Thread Esc3300
Esc3300 added a comment.


  We now have author names a detailed strings, so queries to P50 
 wont necessarily need to be considered.
  
  The overall situation is comparable to Commons, where "depicts" statements 
link to Wikidata items.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Esc3300
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-07 Thread EgonWillighagen
EgonWillighagen added a comment.


  1,939,738 authors -> https://w.wiki/3o2i
  
  trying to get all unique properties of these times out.
  
  Samples 50k authors for properties with an author as subject, 
https://w.wiki/3o3C, results:
  
  - 96% is linked to a profession (P106 
)
  - 94% is linked to country of citizenship (P27 
)
  - 90% is linked to a place of birth (P19 
)
  - 36% is linked to an employer (P108 )
  - 17% is linked to a notable work (P800 
)
  - 9% is linked to their doctoral advisor (P184 
)
  - 8% is linked to the political party they are member of (P102)
  
  These specific properties can be used to calculate the overall statistics. 
The inverse properties (where the author is the object) seems a bit more 
trickier and I'm running into time outs there. I hope this helps.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, EgonWillighagen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread Daniel_Mietchen
Daniel_Mietchen added a comment.


  In T281854#7063597 , 
@Lydia_Pintscher wrote:
  
  > Thanks everyone. For context: this is just one of many options we are 
currently investigating to create an overview of our options. We think it is 
important to have a larger discussion about how to move forward with the Query 
Service but we need to know more about each of the options we have. We are 
currently trying to determine for each option what it actually means in terms 
of how much breathing room it buys us, how many people would be affected, etc. 
That's one of the tasks for this. We hopefully have the larger overview for 
discussion soon.
  
  Is there a public version of that overview of the different options? There is 
WikiProject Limits of Wikidata 
 for 
such purposes, and it would certainly welcome some more detailed information 
about the various known or suspected limits and how they interact with each 
other and with potential solutions.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Daniel_Mietchen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T281854#7266495 , 
@EgonWillighagen wrote:
  
  > @AKhatun_WMF, when you write "authors connected to other subgraphs", do you 
mean subgraphs within Wikidata (so, excluding external identifiers), or also 
graphs from other resources part of, for example, the Linked Open Data Cloud?
  
  I mean within wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread EgonWillighagen
EgonWillighagen added a comment.


  @AKhatun_WMF, when you write "authors connected to other subgraphs", do you 
mean subgraphs within Wikidata (so, excluding external identifiers), or also 
graphs from other resources part of, for example, the Linked Open Data Cloud?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, EgonWillighagen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Harej
Harej added a comment.


  Wikicite.org uses an extremely broad definition of publication that includes 
far more than scholarly sources. There are some thousands of classes that are 
counted as subclasses of “publication”.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Harej
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Fnielsen
Fnielsen added a comment.


  Wikicite.org (Jakob Voß)  http://wikicite.org/statistics.html states 39 994 
937 = 43% for 2021-06-28. The Scholia statistics is only for the "scholarly 
article" item. I think Voß counts instances of scholarly + non-scholarly 
publications.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Fnielsen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Fnielsen
Fnielsen added a comment.


  >> "percentage, number of Wikidata entities that are scholarly article": 
  >> 37.246.721 Scholarly articles, so 37/97 ~ 40% are scholarly articles.
  >
  > Could I get an idea of what the 97 was and where the number was listed 
maybe?
  
  Hmmm... Maybe I meant 94. On the Danish frontpage of Wikidata it states 
94.564.779 data elements.
  
  37321680 / 94564779 = 0.39466787100512335 ~ 39%

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, Fnielsen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen assigned this task to AKhatun_WMF.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, CBogen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: CBogen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen renamed this task from "[EPIC] Get baseline measurements/expectations 
for splitting scholarly articles from Wikidata" to "Get baseline 
measurements/expectations for splitting scholarly articles from Wikidata".
CBogen removed a project: Epic.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: CBogen
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331, NavinRizwi, Dinoguy1000
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-24 Thread AKhatun_WMF
AKhatun_WMF added a comment.


  In T281854#7062631 , 
@Fnielsen wrote:
  
  > Some of the statistics that is wanted are listed on Scholia, currently on 
the frontpage: https://scholia.toolforge.org/ (UPDATE: now here: 
https://scholia.toolforge.org/statistics)
  >
  > "percentage, number of Wikidata entities that are scholarly article": 
  > 37.246.721  Scholarly articles, so 37/97 ~ 40% are scholarly articles.
  
  Could I get an idea of what the 97 was and where the number was listed maybe?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF
Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-22 Thread Esc3300
Esc3300 added a comment.


  There is a recent request to make items for scholarly articles more 
stand-alone, i.e.
  
  - Wikidata:Property proposal/Author last names 

  - Wikidata:Property proposal/Author first names 

  
  would ensure that items could be used without resolving author items. This 
would simplify storing them in a separate Wikibase.
  
  I still have to go through T282139 
 in detail, but it seems it has 
mostly become an analysis over (the somewhat static corpus of) scholarly 
articles in Wikidata rather than Wikidata, given the numbers involved.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Esc3300
Cc: Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, 
Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, 
Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-12 Thread MPhamWMF
MPhamWMF added a comment.


  @Sj This is primarily being evaluated as a last resort mitigation in the case 
of catastrophic failure, specifically having to do with max size limitations of 
Blazegraph. The primary aim is to determine the best way of keeping WD/QS 
minimally functional in the event of this undesired scenario -- basically we're 
measuring out a parachute we hope to not have to use, and if we did, would 
intend it to be a temporary state while we resolve the underlying larger 
issues. If we discover along the way a better way of splitting the graph that 
improves both the technical performance of the machines and how users use it, 
we will consider incorporating these learnings into a more permanent scaling 
strategy.
  
  With regard to a forum for discussion, we are in the process of preparing 
more official communications that provides an overview of the situation, and a 
more dedicated venue of discussion than phab tickets. We appreciate everyone's 
patience as we work on finalizing things.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, 
nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, 
GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread EgonWillighagen
EgonWillighagen added a comment.


  In T281854#7185253 , 
@Multichill wrote:
  
  > No it's not, please have a look at the task description. This is about 
getting metrics.
  
  Can you elaborate on the "this plan" in that description? What do you know 
more that others do not?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EgonWillighagen
Cc: Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread Sj
Sj added a comment.


  @Multichill the opening says "so that I can decide whether to move ahead with 
this plan and how to communicate it." -- it would help if that linked to a 
separate task, whose implementation depended on the outcome of this one.  In 
the absence of that, this seems like the best and only? place in Phab to 
discuss the impacts of the split.
  
  @MPhamWMF Is this being evaluated as a one-off / one-time split, or is it a 
more general eval of the performance considerations from switching from a 
monolithic WD graph to a set of graph shards, with some max size (what's the 
rough range you imagine beyond which things stop scaling)?  Any thoughts on 
performance implications of the latter may also be of interest to many of the 
large wikibase users, who regularly want to query a combination of at least one 
specialist base and WD itself, mediated by some query interface.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sj
Cc: Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread Sj
Sj added a comment.


  @Multichill the opening says "so that I can decide whether to move ahead with 
this plan and how to communicate it." -- it would help if that linked to a 
separate task, whose implementation depended on the outcome of this one.  In 
the absence of that, this seems like the best and only? place in Phab to 
discuss the impacts of the split.
  
  @MPhamWMF Is this being evaluated as a one-off / one-time split, or is it a 
more general eval of the performance considerations from switching from a 
monolithic WD graph to a set of graph shards, with some max size (what's the 
rough range you imagine beyond which things stop scaling)?  Any thoughts on 
performance implications of the latter may also be of interest to many of the 
large wikibase users, who regularly want to query a combination of at least one 
specialist base and WD itself, mediated by some query interface.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sj
Cc: Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Multichill
Multichill added a subscriber: Harej.
Multichill added a comment.


  In T281854#7184875 , 
@Harej wrote:
  
  > In T281854#7184854 , 
@Multichill wrote:
  >
  >> This is not the place to discus if these items should be moved out or not.
  >
  > This is a confusing statement seeing as the task is explicitly about 
"splitting scholarly articles from Wikidata". If this is just from a backend 
perspective the task should be clarified as such.
  
  No it's not, please have a look at the task description. This is about 
getting metrics.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Multichill
Cc: Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Fnielsen
Fnielsen added a comment.


  "percentage, number of WDQS queries per month that involve scholarly articles 
(including authors and publications)"
  
  It is unclear for us Scholia people how much load we are putting on WDQS. We 
have a tendency to do multiple SPARQL queries on each page and that might not 
be a problem or it might be a very bad thing. I recall a statistics on the WDQS 
query that it was mostly Magnus Manske tools that put load on WDQS, - but I 
might remember it wrongly.
  
  In Scholia, we now add a "# tool: scholia" on top of most of our queries. I 
am not aware of other tools doing that. Perhaps it was an idea to do that, so 
that Wikimedia Foundation people could more easily do statistics wrt. the 
tools. (perhaps there should not be a space between "#" and "tool".

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Fnielsen
Cc: Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Fnielsen
Fnielsen added a comment.


  Going back to the quantifiable: "percentage, number of scientific papers that 
are connected to non-scientific paper items in WD (not including authors and 
publications)"
  
  We would hope that every scientific paper has a topic annotation with one or 
more of the Wikidata items - either non-scientific paper items or - in rare 
instances - scientific paper items. Currently we "only" have around 15 million 
of these links. "Links from works to their main subjects": 
https://scholia.toolforge.org/statistics
  
  All scientific papers could also have the language set.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Fnielsen
Cc: Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Harej
Harej added a comment.


  In T281854#7184854 , 
@Multichill wrote:
  
  > This is not the place to discus if these items should be moved out or not.
  
  This is a confusing statement seeing as the task is explicitly about 
"splitting scholarly articles from Wikidata". If this is just from a backend 
perspective the task should be clarified as such.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Harej
Cc: Andrawaag, Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Multichill
Multichill added a comment.


  Hi folks, please stick to the Phabricator etiquette as described at 
https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette . This is 
not the place to discus if these items should be moved out or not. @MPhamWMF 
don't see these comments as any indicator of the community view.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Multichill
Cc: Andrawaag, Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-20 Thread Fnielsen
Fnielsen added a comment.


  @Andrawaag "it is becoming more difficult to see other topics (sometimes 
unrelated to scholarly articles)" Do you have concrete examples on this? It may 
sometimes be difficult to find out what is a topic and what is a scientific 
articles, but once a few scientific articles about the topic has been annotated 
with the "main topic" property then the topic usually shows up on the top.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Fnielsen
Cc: Andrawaag, Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-19 Thread EgonWillighagen
EgonWillighagen added a comment.


  Regarding the question of the "growth of scientific literature", there is a 
good bit of literature on this, and sometimes conflated with the topic of 
"growth of science". I started collecting some knowledge about this: 
https://scholia.toolforge.org/topic/Q107292942

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EgonWillighagen
Cc: Andrawaag, Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-19 Thread Andrawaag
Andrawaag added a comment.


  I would not call it evicting scholarly articles. Scholarly articles are 
currently a major driving force for Wikidata, however, its size is problematic 
because it is becoming more difficult to see other topics (sometimes unrelated 
to scholarly articles). I have thought about and working towards a federated 
landscape of linked wikibases and other semantic web resources for a while now. 
Building such a federated landscape is already easy peasy. We have wbstack, 
wikibase docker, but also platforms like GraphDB, Virtuoso, Stardog (to just 
mention a few). It would take a simple hackathon and some motivated users to 
build a nice prototype.
  
  But setting up such a federated landscape is the easy part. What is more 
difficult is to be able to map between the different endpoints (wikidata, 
wikibases, other rdf stores), 
  Givens its size the subgraph of scholarly articles simply deserves its own 
metal to excel beyond the current limitations. The main question then becomes 
how to align this new subgraph with the other parts of Wikidata, to which it 
intrinsically links (as @Daniel_Mietchen  says.).
  
  So I am actually in favour of separating the subgraph of scholarly articles 
from Wikidata (the incubator) to a node in Wikidata (the linked knowledge 
graph) and the global semantic web,
  
  I indeed said: Moving away from Wikidata to Wikidata :) We need a new term 
for the knowledge graph where the current Wikidata is an index or sort of DNS 
to other   (semantic web) nodes.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Andrawaag
Cc: Andrawaag, Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, 
EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread EgonWillighagen
EgonWillighagen added a comment.


  I am with @Harej here. Focusing on the largest data set is not the right 
approach. As I have indicated in similar discussions elsewhere, there will be a 
next large subset and this one will also be large. From the field chemistry, 
60M items is nothing. The number of species every observed is millions. There 
are many things that easily go into the millions. At this moment, we have a 
small subset of chemicals in Wikidata (~1.2 million), because of the growing 
pains this is artificially low (real chemical databases have >102 M records of 
chemicals experimentally studied). I regularly run into missing content (even 
just looking at the English Wikipedia), and am very selective in what i add at 
this moment.
  
  As soon as you remove one big blob, all that will happen is that the void 
will be very quickly filled by another big blob. Now, if a single database is 
not possible, then the overall design must just change, and everything should 
become a separate namespace and make sure the federation works extremely well: 
the reason why Wikidata works so awesome, is that I can move from one topic to 
underlying data sources because everything is integrated. Please take that into 
consideration.
  
  In fact, it the sake is just to split out a blog and see what happens, then 
plz focus on something more volatile then the knowledge about reality, and 
remove for example things that changes every year. For example, remove all 
humans, all of them, and organizations. There will be a new human tomorrow. 
When it comes to facts, who care who did or studied it, but just focus on what 
happened or what was discovered.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EgonWillighagen
Cc: Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread Harej
Harej added a comment.


  I'm interested in others' opinions as well because I am far from the only 
perspective in the room.
  
  First: at what levels would this graph division take place? Would this be 
something largely behind the scenes, not visible to the Wikidata community 
unless you're working directly with the graph query API? Or would this be a 
highly visible change, on the level of splitting Wikidata into distinct new 
Wikimedia projects? That I think could affect to what extent getting the 
details right "matters".

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Harej
Cc: Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread MPhamWMF
MPhamWMF added a comment.


  Thanks for your thoughts here @Harej; it's really helpful to have these 
insights from someone closer to the (Wiki)data content itself.
  
  You are correct that this specific ticket is identifying the largest subset 
of data to split off from the Wikidata graph. The primary rationale behind this 
is to explore mitigation strategies for a worst case scenario of catastrophic 
failure of Blazegraph, and to understand what options we have available to 
preserve limited functionality of WD(QS) rather than have no functionality in 
this scenario. In that regard, identifying the largest subset of data seems 
reasonable as it would directly address the potential problem of hitting 
Blazegraph's max size constraint.
  
  To your other point though, it is only one way of splitting the graph. I am 
definitely interested in exploring other more reasonable ways we could divide 
the graph and the potential benefits it may have. For your "(not) media" 
suggestion, did you have a clear heuristic in mind for identifying this 
distinction? If so, it'd be great to start a new ticket to investigate the 
possible benefits of splitting the graph along the lines you suggest -- 
hopefully in the case that we do need/want to split the graph, that solution we 
would know that solution would work (better) for everyone!

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread Harej
Harej added a comment.


  This seems like an arbitrary way to cut up Wikidata. It very much smacks of 
"let's take the largest subset of our dataset and evict it," without 
consideration to why the dataset should be cut up this way.
  
  What are the boundaries of these new projects? Is Wikidata a graph for 
everything except scholarly articles? What about books, or other forms of 
citable media (i.e., any and all media)? What about scholarly articles that are 
relevant to the Wikidata graph in ways other than WikiCite's massive citation 
graph?
  
  I am very interested in the subgraph conversation and how we can envision 
Wikidata as part of a massive linked data ecosystem without itself being overly 
burdened. I think evicting arbitrary subsets of the data is just not good 
strategy.
  
  If I were to suggest a change, perhaps we could divide the graph along 
"media" and "not media". (We can subsequently decide if we want to split "not 
media" further.) This I think would draw lines that are coherent and not 
arbitrary. The scholarly articles would be a part of the media graph project. 
And there would be free cross-referencing between the sites. Do you think this 
would achieve your goals?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Harej
Cc: Harej, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, 
Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, 
Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-10 Thread MPhamWMF
MPhamWMF added a parent task: T282790: Get estimates for dropping data from 
Wikidata in case of Blazegraph catastrophic failure.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-10 Thread MPhamWMF
MPhamWMF triaged this task as "High" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-07 Thread waldyrious
waldyrious updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: waldyrious
Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Thanks everyone. For context: this is just one of many options we are 
currently investigating to create an overview of our options. We think it is 
important to have a larger discussion about how to move forward with the Query 
Service but we need to know more about each of the options we have. We are 
currently trying to determine for each option what it actually means in terms 
of how much breathing room it buys us, how many people would be affected, etc. 
That's one of the tasks for this. We hopefully have the larger overview for 
discussion soon.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, 
Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, 
LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Fnielsen
Fnielsen added a subscriber: nichtich.
Fnielsen added a comment.


  "rate of growth of scholarly articles"
  wikicite.org updates this statistics: http://wikicite.org/statistics.html I 
suppose that is  Jakob Voß (@nichtich) that updates these numbers? The graph 
shows a bit of plateauing recently for publications, while there is a recent 
increase in citations. I would think that James Hare is doing the citations? As 
far as I remember, the citations have been mentioned as a issue of concern with 
respect to Wikidata data size. They are probably a good deal of the number of 
triples.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Fnielsen
Cc: nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, 
Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, 
MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Fnielsen
Fnielsen added a comment.


  Some of the statistics that is wanted are listed on Scholia, currently on the 
frontpage: https://scholia.toolforge.org/
  
  "percentage, number of Wikidata entities that are scholarly article": 
  37.246.721Scholarly articles, so 37/97 ~ 40% are scholarly articles.
  
  "percentage, number of WDQS queries per month that involve scholarly article 
(including authors and publication)"
  For Scholia, we have recently turned a number of queries into more templated 
queries and now automatically add "# tool: scholia" as a comment to the 
queries, so it should be possible for Wikimedia employees to count the number 
of Scholia queries (perhaps that was possible before by the referer field?). I 
have had the impression that Scholia's queries were a low number compared to 
Magnus Manske's tools.
  
  "percentage, number of scientific papers that are connected to non-scientific 
paper items in WD"
  Quite a lot of scholarly papers are connected to a journal item, to one or 
more topic items, to a language item, some to a notable author, that is in 
Wikipedia (so we need item in Wikidata). Currently, according to the statistics 
on Scholia there are 14.211.431 topic links. Works may have multiple links so 
perhaps only <10.000.000 works have one or more topics, - we should target for 
most works to have a topic, so I suspect this would grow.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Fnielsen
Cc: EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, 
GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Daniel_Mietchen
Daniel_Mietchen added a comment.


  In T281854#7061460 , 
@MPhamWMF wrote:
  
  > For larger context, this is not to say we're committed to this split yet, 
but we are exploring strategies for scaling Wikidata (and mitigating 
catastrophic failure) that are directed related to the max size that Blazegraph 
is able to handle.
  
  What is that max size?
  
  Also, there are lots of other relevant parameters that are often 
interdependent — we tried to start documenting them here 
 — 
help most welcome.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Daniel_Mietchen
Cc: Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, 
Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread MPhamWMF
MPhamWMF added a comment.


  @LWyatt , "splitting scholarly articles out" here refers to separating out 
the subgraph of scholarly articles -- possibly copying over directly relevant 
items like authors -- from the larger Wikidata graph so that they would be 
independent graphs. They would exist independent from WD, and queries that 
require connecting articles to non-articles would require functional 
federation. This would definitely affect some known workflows and use cases 
(i.e. Scholia), but part of this ticket is to also to assess what percentage of 
queries might be affected by this change.
  
  For larger context, this is not to say we're committed to this split yet, but 
we are exploring strategies for scaling Wikidata (and mitigating catastrophic 
failure) that are directed related to the max size that Blazegraph is able to 
handle.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread LWyatt
LWyatt added a comment.


  Can I get clarification about what is meant, practically, by "splitting 
scholarly articles out"? 
  Does this mean something in the backend that is about how that content is 
stored/accessed by the query system (but is otherwise invisible to the general 
reader of Wikidata). Or, does it mean removing these items from WD completely?

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: LWyatt
Cc: PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-04 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Maintenance_bot
Cc: Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-04 Thread MPhamWMF
MPhamWMF created this task.
MPhamWMF added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a product manager for Wikidata and WDQS, I want to know what quantifiable 
benefits to service reliability and quality I might expect to gain (or lose) by 
splitting scholarly articles out from the Wikidata graph, so that I can decide 
whether to move ahead with this plan and how to communicate it.
  
  In order to move ahead with splitting out scholarly articles from WD, 
communicate this decision, and set expectations around the benefits of 
implementing this change, we should get some baseline measurements of the 
current state of scholarly articles in Wikidata and WDQS, and estimates about 
the effects of splitting them off.
  
  AC:
  Get the numbers for the following metrics:
  
  [ ] percentage, number of Wikidata entities that are scholarly article
  [ ] percentage, number of WDQS queries per month that involve scholarly 
article (including authors and publication)
[ ] percentage, number of the above queries that only involve scholarly 
artivle (including authors and publication)
  [ ] percentage, number of scientific papers that are connected to 
non-scientific paper items in WD (not including authors and publications)
  [ ] given the current rate of growth of Wikidata, approximately how much time 
it would take for Wikidata to grow back to its current size if we removed 
scholarly articles
  [ ] rate of growth of scholarly articles

TASK DETAIL
  https://phabricator.wikimedia.org/T281854

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org