Lucas_Werkmeister_WMDE created this task.
Lucas_Werkmeister_WMDE added projects: Wikidata, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION

As a user of the Wikidata Query Service, I want to be able to write reliable queries related to particular statements, which I know by their Wikibase statement ID.

Problem:
Currently, the RDF format documentation does not document the format of statement URIs after the wds: prefix:

There is no guaranteed format or meaning to the statement id.

In practice, however, it’s not difficult to see how the statement URI is derived from the statement ID: they are identical, except that the $ in the Wikibase representation is replaced by a - for RDF. We use this, for example, in #wikibase-quality-constraints (SparqlHelper.php).

This is not what the RDF export does, though: it actually uses preg_replace( '/[^\w-]/', '-', $statement->getGuid() ), which means that other characters than $ may also be replaced by hyphens if they ever occur in the future. In this case, tools that only inferred the $- rule from looking at the data might break.

Example:
As mentioned above, WBQC is one case that would benefit from documenting this relationship and making it part of the Stable Interface Policy. @ArthurPSmith also requested it on project chat.

Screenshots/mockups:
In RDF Dump Format#Full statements, change the current

There is no guaranteed format or meaning to the statement id.

to e. g.

The statement ID is the Wikibase statement ID, with all characters other than PCRE “word” characters replaced by hyphens. In PHP, this is expressed as preg_replace( '/[^\w-]/', '-', $statementID )

Acceptance criteria:

  • the format of statement IDs in RDF is documented and part of the stable interface

Open questions:

  • I would actually prefer to slightly tweak the regex before we fix it in stone – \w is not really well-defined (the PHP documentation doesn’t clearly specify what a “word” character is, and apparently it can be locale-dependent?), so I’d make it more explicit with something like [^a-zA-Z0-9_\-] (only ASCII letters and digits, underscore, and hyphen).
  • Should we announce this once it’s done?

TASK DETAIL
https://phabricator.wikimedia.org/T214680

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Aklapper, Lucas_Werkmeister_WMDE, ArthurPSmith, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to