[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-23 Thread Ladsgroup
Ladsgroup removed a project: Wikidata-Former-Sprint-Board.
TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: hoo, Ladsgroup, Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, Gq86, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-21 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.
Note: we’ve decided to go ahead with the ObjectCache for now, and store what would be wbqc_result above in the cache instead: see T181060: Cache constraint check results per-entity in ObjectCache (L). We can migrate that to the database, as described above, in the future. In the meantime, review of the above schema is most welcome!

There is also some prior art: an older version of the WikibaseQuality extension contained this table, as found in T102992: [Task] Review WikidataQuality DB schema:

CREATE TABLE IF NOT EXISTS /*_*/wbq_violations (
	entity_id VARBINARY(15) NOT NULL,
	pid   VARBINARY(15) NOT NULL,
	claim_guidVARBINARY(63) NOT NULL,
	constraint_id VARBINARY(63) NOT NULL,
	constraint_type_entity_id VARBINARY(15) NOT NULL,
	additional_info   TEXT  DEFAULT NULL,
	updated_atVARBINARY(31) NOT NULL,
	revision_id   INT(10) UNSIGNED  NOT NULL,
	statusVARBINARY(31) NOT NULL,
	PRIMARY KEY (claim_guid, constraint_id)
) /*$wgDBTableOptions*/;

CREATE INDEX /*i*/claim_guid ON /*_*/wbq_violations (claim_guid);
CREATE INDEX /*i*/constraint_id ON /*_*/wbq_violations (constraint_id);

Here, each constraint violation gets a database row of its own. In the above form, this table is tied to the notion that constraints are only checked on the main snak of a statement (via the claim_guid column), which is no longer true since T168532: Check constraints on qualifiers and references. However, we could probably construct some alternative “address” that would still, in combination with the constraint ID, be a unique key for such a table. But I’m not sure if that’s an advantage.TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: hoo, Ladsgroup, Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, Gq86, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-16 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.
Here are the result statuses of 100 randomly chosen items (P6333):

"bad-parameters" 15
"warning" 66
"not-main-snak" 1601
"todo" 26
"compliance" 5096

It appears the average item has plenty of checkable snaks, but very few violations (which is great!). So I’m very much leaning towards the optimization mentioned in the previous comment: to only store those results that the gadget will display (here: 81 out of 6800).

And in that case, we should seriously consider using the same storage mechanism for both T180582: List of all constraint violations? and this task, since they’ll contain the same information. (That is: for now, cache constraint check results; later, add a special page that lists cached constraint check results of a property across all entities that use it; and perhaps later still, add a job that periodically checks constraints on random items, so that the constraint results of the special page don’t depend on human editors visiting items with the checkConstraints gadget enabled.) I’m just not sure how best to do that.

The task is: we have a JSON blob (or, if you will, a PHP array – the constraint check results, in any event), which we want to store for some time (with the ability to explicitly remove it, either on page purge or because we detect it’s no longer valid). Currently, we will access it by entity ID (one blob per entity), but in the future, we will also want to find it by a list of property IDs, and possibly constraint IDs (statement IDs), that it involves. That list can be determined from the blob, but not with a simple string search.TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-15 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.
Do we really want to cache all constraint check results… or only those that we will actually show in the gadget: warning, violation and bad-parameters?

For reference: Berlin (Q64) currently has the following distribution of result statuses (P6320):

1344 "compliance"
 196 "not-main-snak"
  80 "todo"
  73 "warning"
  18 "deprecated"
   1 "violation"

Of those 1712 results, the gadget only cares about 74. If we remove the rest, we massively reduce the size of the JSON we transmit and cache, which I suddenly find very tempting.

Of course, for this caching to still be correct, the API request has to explicitly indicate that it doesn’t care about the other results, and API requests without such an indication would not benefit from caching. Is that acceptable?TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-08 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.
Oh, and another thing – I suppose we also need to invalidate the cached results somehow when a constraint statement is edited. I’m not sure what the most efficient way to do that is.


Explicitly purge cached results when a constraint statement is edited. This is almost certainly too expensive.
Store revision IDs of all properties with constraints in the cached result and reject the result if one of the properties has been edited since then. I’m not sure if this is efficient… constraint checks on Q42 seem to use 122 properties, can we get that many revision IDs in a single database SELECT?
Keep a global “constraint revision ID” which is only incremented when constraints change, store the current “constraint revision ID” in the cached result and reject it if it’s no longer up to date. However, due to T163465, we currently don’t know in UpdateConstraintsTableJob whether the constraints have actually changed or not, so we would increment that ID with every change to any property, which is actually fairly frequent according to current RecentChanges (about once every five minutes, on average). But I suppose we could also make UpdateConstraintsTableJob smarter – instead of unconditionally purging and re-importing the constraints, get the old ones, import the new ones, compare them, and only store an update if there’s a difference.
As a combination of the last two ideas, keep such a “constraint revision ID” per constraint (identified by its statement ID), instead of a single global one.
TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-07 Thread Franziska_Heine
Franziska_Heine added a project: Wikidata-Test-Sprint.
TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Franziska_HeineCc: Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179849: Cache all constraint check results per-entity

2017-11-07 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a project: Wikidata-Sprint.
TASK DETAILhttps://phabricator.wikimedia.org/T179849EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: Jonas, Aklapper, Lucas_Werkmeister_WMDE, Lahi, GoranSMilovanovic, QZanden, Agabi10, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs