Ladsgroup added subscribers: Lucas_Werkmeister_WMDE, Lydia_Pintscher, Ladsgroup. Ladsgroup added a comment.
So I looked at this. It's a bigger problem in general and it's due to the way we handle "not matching". It's slightly complex, so bear with me. so the query that query builder produces is this: https://w.wiki/unm (with some modifications): SELECT ?item ?itemLabel ?instance ?instanceLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255. ?item (p:P31/ps:P31/(wdt:P279*)) ?instance. FILTER(?instance != wd:Q5) } LIMIT 5 What is happening? it's basically going up the ladder of `P279` for not matching, so `P279` of Q5 and its `P279` and so on, so each item becomes several rows (as wdqs is a graph db of a triples not items). So the result would be: | wd:Q7251 | Alan Turing | wd:Q103940464 | continuant | | wd:Q7251 | Alan Turing | wd:Q99527517 | collection entity | | wd:Q7251 | Alan Turing | wd:Q53617489 | independent continuant | | wd:Q7251 | Alan Turing | wd:Q28813620 | set | | wd:Q7251 | Alan Turing | wd:Q27043950 | anatomical entity | | wd:Q7251 | Alan Turing | wd:Q16887380 | group | | wd:Q7251 | Alan Turing | wd:Q26720107 | subject of a right | | wd:Q7251 | Alan Turing | wd:Q35120 | entity | | wd:Q7251 | Alan Turing | wd:Q23958946 | individual entity | | wd:Q7251 | Alan Turing | wd:Q159344 | heterotroph | | wd:Q7251 | Alan Turing | wd:Q7239 | organism | | wd:Q7251 | Alan Turing | wd:Q24229398 | agent | | wd:Q7251 | Alan Turing | wd:Q18336849 | item with given name property | | wd:Q7251 | Alan Turing | wd:Q830077 | subject | | wd:Q7251 | Alan Turing | wd:Q795052 | individual | | wd:Q7251 | Alan Turing | wd:Q45983014 | organisms by adaptation | | wd:Q7251 | Alan Turing | wd:Q72638 | consumer | | wd:Q7251 | Alan Turing | wd:Q3778211 | legal person | | wd:Q7251 | Alan Turing | wd:Q215627 | person | | wd:Q7251 | Alan Turing | wd:Q164509 | omnivore | | wd:Q7251 | Alan Turing | wd:Q154954 | natural person | | wd:Q7251 | Alan Turing | wd:Q5 | human | | And it only removes the last line (and leaves the rest) making the query both incorrect and full of duplicates. I talked to @Lucas_Werkmeister_WMDE and came up with several solutions but each has pros and cons. One: https://w.wiki/unp SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255. ?item (p:P31/ps:P31) ?class. MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. } } LIMIT 5 Basically take every one who has P31 <https://phabricator.wikimedia.org/P31>, and remove anything that has the Q5 in the P279 <https://phabricator.wikimedia.org/P279> ladder Pros: - Correct Con: - It times out Two: https://w.wiki/unu The other way to handle it is to actually discard P279 <https://phabricator.wikimedia.org/P279> ladder for "not matching" part. SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255. MINUS {?item p:P31/ps:P31 wd:Q5. } } LIMIT 5 Pros: - It's fast Cons: - It's limited, If I want to filter out galaxies from my result, it wouldn't exclude spiral galaxies, etc. I don't know which way to go. I think @Lydia_Pintscher should decide here. TASK DETAIL https://phabricator.wikimedia.org/T272140 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs