Re: [Wikidata-tech] On interface stability and forward compatibility

Markus Krötzsch Fri, 05 Feb 2016 05:25:19 -0800

Hi Daniel,

I feel that this tries to evade the real issue by making formal rulesabout what kind of "breaking" you have to care about. It would be betterto define "breaking change" based on its consequences: if importantservices will stop working, then you should make sure you announce it intime so this will not happen. This requires you to talk to people onthis list. I think the whole proposal below is mainly trying to give yousome justification to avoid communication with your stakeholders. Thisis not the way to go.

This said, it is always nice to have some guidelines as to what islikely to change and what isn't. It is probably enough to give somewarnings about this ("there might be additional keys in this map in thefuture" or "there might be additional datatype URIs in the future").However, this is no recipe to avoid breaking changes. In particular, theguideline to ignore snaks of properties that have no understandabledeclaration is just codifying a controlled way of failing, not avoidingfailure:

* Browsing interfaces (e.g., Reasonator, Miga Class & Property Browser)are expected to show all data to users. If they don't, this is breakingthem.* Query services are expected to use all data. If you do an aggregatequery to count all properties on Wikidata, then the number returned willnot be incomplete but simply wrong if the service ignores half of the data.* Editing tools (including bot frameworks) are most heavily affected,since they might create duplicates of statements if they fail to seesome of the data following your guideline.

This does not mean that your guideline is unreasonable -- in fact, Ithink this is what most tools are doing anyway. But as the examplesshow, it's not enough to prevent major service disruptions that wouldaffect many people. The guideline that tools should sometimes raise analert or issue a warning does work in many cases, since we have acomplex ecosystem with many inter-dependent services (for example, howshould a SPARQL Web service communicate problems that occurred whenimporting the data? All of them or somehow only the ones that might haveaffected they query result?).

Our tools rely on being able to use all data, and the easiest way toensure that they will work is to announce technical changes to the JSONformat well in advance using this list. For changes that affect aparticular subset of widely used tools, it would also be possible toseek the feedback from the main contributors of these tools atdesign/development time. I am sure everybody here is trying their bestto keep up with whatever changes you implement, but it is not alwayspossible for all of us to sacrifice part of our weekend on short noticefor making a new release before next Wednesday.


Cheers,

Markus


On 05.02.2016 13:10, Daniel Kinzler wrote:

Hi all!

In the context of introducing the new "math" and "external-id" data types, the
question came up whether this introduction constitutes a breaking change to the
data model. The answer to this depends on whether you take the "English" or the
"German" approach to interpreting the format: According to
<https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed>, in
England, "everything which is not forbidden is allowed", while, in Germany, the
opposite applies, so "everything which is not allowed is forbidden".

In my mind, the advantage of formats like JSON, XML and RDF is that they provide
good discovery by eyeballing, and that they use a mix-and-match approach. In
this context, I favour the English approach: anything not explicitly forbidden
in the JSON or RDF is allowed.

So I think clients should be written in a forward-compatible way: they should
handle unknown constructs or values gracefully.


In this vein, I would like to propose a few guiding principles for the design of
client libraries that consume Wikibase RDF and particularly JSON output:

* When encountering an unknown structure, such as an unexpected key in a JSON
encoded object, the consumer SHOULD skip that structure. Depending on context
and use case, a warning MAY be issued to alert the user that some part of the
data was not processed.

* When encountering a malformed structure, such as missing a required key in a
JSON encoded object, the consumer MAY skip that structure, but then a warning
MUST be issued to alert the user that some part of the data was not processed.
If the structure is not skipped, the consumer MUST fail with a fatal error.

* Clients MUST make a clear distinction of data types and values types: A Snak's
data type determines the interpretation of the value, while the type of the
Snak's data value specifies the structure of the value representation.

* Clients SHOULD be able to process a Snak about a Property of unknown data
type, as long as the value type is known. In such a case, the client SHOULD fall
back to the behaviour defined for the value type. If this is not possible, the
Snak MUST be skipped and a warning SHOULD be issued to alert the user that some
part of the data could not be interpreted.

* When encountering an unknown type of data value (value type), the client MUST
either ignore the respective Snak, or fail with a fatal error. A warning SHOULD
be issued to alert the user that some part of the data could not be processed.


Do you think these guidelines are reasonable? It seems to me that adopting them
should save everyone some trouble.



_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] On interface stability and forward compatibility

Reply via email to