Lucas_Werkmeister_WMDE added subscribers: Ladsgroup, hoo.
Lucas_Werkmeister_WMDE added a comment.

Okay, I discussed this with @Ladsgroup and this would be the current draft:

One table from entity ID to cached constraint check results, perhaps called wbqc_results:

result_entity_idresult_json
Q42{"v":{"Q42":12345,"Q43":123456,…},"r":{…}}

I’m not sure if we use the entity ID as the primary key or add another column (either the numeric entity ID or perhaps an auto-incrementing counter). This also assumes that we’re storing the (in)validation info (the map of page IDs to revision IDs, see comment) inside the JSON blob – I suppose it could also be stored separately, but I’m not sure if that has any advantage.

Another table from entity ID to constraint IDs used in the constraint report. (I don’t have a good name for this one yet.)

result_entity_idconstraint_id
Q422
Q423
Q424
Q432

(Depending on the first “I’m not sure” above, result_entity_id may also be another column of wbqc_results.) constraint_id here is a numeric ID assigned to constraints, not the constraint_guid we currently have – we need to update the wbqc_constraints table to add this ID as a primary key, so that we don’t have to store and join against the >40-character string constraint GUID.

When checking constraints, we first get the cached results from wbqc_results (via result_entity_id). We parse the JSON and get the map of page IDs to revision IDs, and check the page table to see if the revision ID is still the page_latest for that page. If yes, we use the cached results, otherwise we do the whole constraint checking process and finally store the results back in wbqc_results.

When looking for all violations of a constraint (which could be part of T180582: List of all constraint violations?), we join wbqc_results, the unnamed second table from above, and wbqc_constraints, and get the wbqc_results.result_json where the wbqc_constraints.constraint_guid is the given constraint GUID.

When looking for all violations of a property (which is T180582: List of all constraint violations? as originally stated), we do the same thing but look for all constraints where wbqc_constraints.pid is the given property ID.

For reference, this is the current wbqc_constraints we have:

constraint_guidpidconstraint_type_qidconstraint_parameters
P10$1529A843-91A8-4FEA-B43F-9A3258446A3810Q21502404{"P1793":{"0":{"snaktype":…
P10$28240BCD-348B-45BD-9654-3B5B8E101FD610Q21510852{"P2307":{"0":{"snaktype":…
P1000$3727DD2D-2ED9-4D53-98EB-5618039508E31000Q21510865{"P2308":{"0":{"snaktype":…

constraint_guid is a PRIMARY KEY according to the SQL schema (though not one we can actually use, according to Amir), and there’s an index on the pid.

@hoo: can you perhaps check if this makes sense?


TASK DETAIL
https://phabricator.wikimedia.org/T179849

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: hoo, Ladsgroup, Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to