Daniel, thanks, inline:

The structure looks sane and future-proof to me, but since it's
> all-in-one-blob,
> it'll be hard to scale it to more than a few ten thousand lines or so. I
> like
> this model, but if you want to go beyond that (DO we want to go beyond
> that?!)
> you will need a different approach, which may be incompatible.
>

We do *eventually* want to go beyond that towards large data. We had this
discussion with Brion, see here:
*  https://phabricator.wikimedia.org/T120452#2224764

I do not think my approach is a blocker for larger datasets, because you
can add simple SQL-like interface capable of reading data from these pages
and from large backend databases. 2MB page limit will prevent page data
from growing too large. Also, larger datasets is a different target, that
we should approach when we are ready.

One thing that should be specified very rigorously from the start are the
> supported data types, along with their exact syntax and semantics. Your
> example
> has string, number, boolean, and localized. So:
>
> * what's the length limit for string?
>
Good question. Do you have a limit for Wikidata labels and other string
values?

> * what's the range and precision of number? Is it the same as for JSON?
>
For now, same as JSON.

> * does boolean only accept JSON primitives, or also strings?
>
true/false only, no strings

> * what language codes are valid for localized? Is language fallback
> applied for
> display?
>
Same rules as for wiki language codes (but without validation against the
actual list). Automatic fallback is already implemented, using Language
class.  If everything else fails, and there is no English, takes random
first (unlike Language which stops at English and fails otherwise).


> You write in your proposal "Hard to define types like Wikidata ID,
> datetime, and
> URL could be stored as a string until we can reuse Wikidata's type system".
> Well, what's keeping you from using it now? DataValue and friends are
> standalone
> composer modules, you can find them on github.

I was told by the Wikidata team at the Jerusalem hackathon that the
Javascript code is too entangled, and I won't be able to reuse it for
non-Wikidata stuff.  I will be very happy to adapt it if possible. Yet, I
do not think this is a requirement for the first release.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to