Hi Daniel,

I feel that this tries to evade the real issue by making formal rules about what kind of "breaking" you have to care about. It would be better to define "breaking change" based on its consequences: if important services will stop working, then you should make sure you announce it in time so this will not happen. This requires you to talk to people on this list. I think the whole proposal below is mainly trying to give you some justification to avoid communication with your stakeholders. This is not the way to go.

This said, it is always nice to have some guidelines as to what is likely to change and what isn't. It is probably enough to give some warnings about this ("there might be additional keys in this map in the future" or "there might be additional datatype URIs in the future"). However, this is no recipe to avoid breaking changes. In particular, the guideline to ignore snaks of properties that have no understandable declaration is just codifying a controlled way of failing, not avoiding failure:

* Browsing interfaces (e.g., Reasonator, Miga Class & Property Browser) are expected to show all data to users. If they don't, this is breaking them. * Query services are expected to use all data. If you do an aggregate query to count all properties on Wikidata, then the number returned will not be incomplete but simply wrong if the service ignores half of the data. * Editing tools (including bot frameworks) are most heavily affected, since they might create duplicates of statements if they fail to see some of the data following your guideline.

This does not mean that your guideline is unreasonable -- in fact, I think this is what most tools are doing anyway. But as the examples show, it's not enough to prevent major service disruptions that would affect many people. The guideline that tools should sometimes raise an alert or issue a warning does work in many cases, since we have a complex ecosystem with many inter-dependent services (for example, how should a SPARQL Web service communicate problems that occurred when importing the data? All of them or somehow only the ones that might have affected they query result?).

Our tools rely on being able to use all data, and the easiest way to ensure that they will work is to announce technical changes to the JSON format well in advance using this list. For changes that affect a particular subset of widely used tools, it would also be possible to seek the feedback from the main contributors of these tools at design/development time. I am sure everybody here is trying their best to keep up with whatever changes you implement, but it is not always possible for all of us to sacrifice part of our weekend on short notice for making a new release before next Wednesday.

Cheers,

Markus


On 05.02.2016 13:10, Daniel Kinzler wrote:
Hi all!

In the context of introducing the new "math" and "external-id" data types, the
question came up whether this introduction constitutes a breaking change to the
data model. The answer to this depends on whether you take the "English" or the
"German" approach to interpreting the format: According to
<https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed>, in
England, "everything which is not forbidden is allowed", while, in Germany, the
opposite applies, so "everything which is not allowed is forbidden".

In my mind, the advantage of formats like JSON, XML and RDF is that they provide
good discovery by eyeballing, and that they use a mix-and-match approach. In
this context, I favour the English approach: anything not explicitly forbidden
in the JSON or RDF is allowed.

So I think clients should be written in a forward-compatible way: they should
handle unknown constructs or values gracefully.


In this vein, I would like to propose a few guiding principles for the design of
client libraries that consume Wikibase RDF and particularly JSON output:

* When encountering an unknown structure, such as an unexpected key in a JSON
encoded object, the consumer SHOULD skip that structure. Depending on context
and use case, a warning MAY be issued to alert the user that some part of the
data was not processed.

* When encountering a malformed structure, such as missing a required key in a
JSON encoded object, the consumer MAY skip that structure, but then a warning
MUST be issued to alert the user that some part of the data was not processed.
If the structure is not skipped, the consumer MUST fail with a fatal error.

* Clients MUST make a clear distinction of data types and values types: A Snak's
data type determines the interpretation of the value, while the type of the
Snak's data value specifies the structure of the value representation.

* Clients SHOULD be able to process a Snak about a Property of unknown data
type, as long as the value type is known. In such a case, the client SHOULD fall
back to the behaviour defined for the value type. If this is not possible, the
Snak MUST be skipped and a warning SHOULD be issued to alert the user that some
part of the data could not be interpreted.

* When encountering an unknown type of data value (value type), the client MUST
either ignore the respective Snak, or fail with a fatal error. A warning SHOULD
be issued to alert the user that some part of the data could not be processed.


Do you think these guidelines are reasonable? It seems to me that adopting them
should save everyone some trouble.



_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to