[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-27 Thread Lydia_Pintscher
Lydia_Pintscher closed this task as "Resolved".
Lydia_Pintscher moved this task from Test (Verification) to Done on the 
Wikidata Query Builder board.
Lydia_Pintscher added a comment.


  

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4990/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Hazizibinmahdi, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-25 Thread Ladsgroup
Ladsgroup moved this task from Doing to Peer Review on the Wikidata Query 
Builder board.
Ladsgroup added a comment.


  https://github.com/wmde/query-builder/pull/166

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4990/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Hazizibinmahdi, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-24 Thread Ladsgroup
Ladsgroup claimed this task.
Ladsgroup moved this task from Ready to pick up to Doing on the Wikidata Query 
Builder board.
Restricted Application added a project: User-Ladsgroup.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4990/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Hazizibinmahdi, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-20 Thread Ladsgroup
Ladsgroup added a comment.


  Filed T272490: Improve labelling performance in query builder 


TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-20 Thread amy_rc
amy_rc added a comment.


  Query tried during the discussion call:
  
SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  {
SELECT DISTINCT ?item ?instance WHERE {
  ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  ?item (p:P31/ps:P31) ?instance.
  MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. }
}
LIMIT 5
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
}

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: amy_rc
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-20 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  In T272140#6756245 , 
@Ladsgroup wrote:
  
  > One:
  > https://w.wiki/unp
  >
  >   SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  > SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
  > ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  > ?item (p:P31/ps:P31) ?class.
  > MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. }
  >   }
  >   LIMIT 5
  >
  > Basically take every one who has P31 
, and remove anything that has the Q5 in 
the P279  ladder
  >
  > Pros:
  >
  > - Correct
  >
  > Con:
  >
  > - It times out
  
  You can rearrange that query to optimize (compare T166139 
):
  
SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  {
SELECT DISTINCT ?item ?instance WHERE {
  ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  ?item (p:P31/ps:P31) ?instance.
  MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. }
}
LIMIT 5
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
}
  
  Then it doesn’t time out anymore (though it’s still expensive).

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-18 Thread Ladsgroup
Ladsgroup added subscribers: Lucas_Werkmeister_WMDE, Lydia_Pintscher, Ladsgroup.
Ladsgroup added a comment.


  So I looked at this. It's a bigger problem in general and it's due to the way 
we handle "not matching". It's slightly complex, so bear with me.
  
  so the query that query builder produces is this: https://w.wiki/unm (with 
some modifications):
  
SELECT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
  ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  ?item (p:P31/ps:P31/(wdt:P279*)) ?instance.
  FILTER(?instance != wd:Q5)
}
LIMIT 5
  
  What is happening? it's basically going up the ladder of `P279` for not 
matching, so `P279` of Q5 and its `P279` and so on, so each item becomes 
several rows (as wdqs is a graph db of a triples not items).
  So the result would be:
  
  | wd:Q7251 | Alan Turing | wd:Q103940464  | continuant|
  | wd:Q7251 | Alan Turing | wd:Q99527517  | collection entity |
  | wd:Q7251 | Alan Turing | wd:Q53617489  | independent continuant|
  | wd:Q7251 | Alan Turing | wd:Q28813620  | set   |
  | wd:Q7251 | Alan Turing | wd:Q27043950  | anatomical entity |
  | wd:Q7251 | Alan Turing | wd:Q16887380  | group |
  | wd:Q7251 | Alan Turing | wd:Q26720107  | subject of a right|
  | wd:Q7251 | Alan Turing | wd:Q35120  | entity|
  | wd:Q7251 | Alan Turing | wd:Q23958946  | individual entity |
  | wd:Q7251 | Alan Turing | wd:Q159344  | heterotroph   |
  | wd:Q7251 | Alan Turing | wd:Q7239  | organism  |
  | wd:Q7251 | Alan Turing | wd:Q24229398  | agent |
  | wd:Q7251 | Alan Turing | wd:Q18336849  | item with given name property |
  | wd:Q7251 | Alan Turing | wd:Q830077  | subject   |
  | wd:Q7251 | Alan Turing | wd:Q795052  | individual|
  | wd:Q7251 | Alan Turing | wd:Q45983014  | organisms by adaptation   |
  | wd:Q7251 | Alan Turing | wd:Q72638  | consumer  |
  | wd:Q7251 | Alan Turing | wd:Q3778211  | legal person  |
  | wd:Q7251 | Alan Turing | wd:Q215627  | person|
  | wd:Q7251 | Alan Turing | wd:Q164509  | omnivore  |
  | wd:Q7251 | Alan Turing | wd:Q154954  | natural person|
  | wd:Q7251 | Alan Turing | wd:Q5 | human |
  |
  
  And it only removes the last line (and leaves the rest) making the query both 
incorrect and full of duplicates.
  
  I talked to @Lucas_Werkmeister_WMDE and came up with several solutions but 
each has pros and cons.
  
  One:
  https://w.wiki/unp
  
SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
  ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  ?item (p:P31/ps:P31) ?class.
  MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. }
}
LIMIT 5
  
  Basically take every one who has P31 , 
and remove anything that has the Q5 in the P279 
 ladder
  
  Pros:
  
  - Correct
  
  Con:
  
  - It times out
  
  Two: https://w.wiki/unu
  The other way to handle it is to actually discard P279 
 ladder for "not matching" part.
  
SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
  ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
  MINUS {?item p:P31/ps:P31 wd:Q5. }
}
LIMIT 5
  
  Pros:
  
  - It's fast
  
  Cons:
  
  - It's limited, If I want to filter out galaxies from my result, it wouldn't 
exclude spiral galaxies, etc.
  
  I don't know which way to go. I think @Lydia_Pintscher should decide here.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-18 Thread Lydia_Pintscher
Lydia_Pintscher updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Aklapper, amy_rc, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-15 Thread amy_rc
amy_rc updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: amy_rc
Cc: Aklapper, amy_rc, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T272140: Duplicates in the result set

2021-01-15 Thread amy_rc
amy_rc removed a subscriber: Aklapper.
amy_rc added projects: Wikidata Query Builder, Wikidata.
amy_rc updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: amy_rc
Cc: amy_rc, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, 
Aklapper
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs