Re: [Wikidata-tech] json to ttl

2017-07-03 Thread Thiemo Mättig
The RDF dumps are marked as BETA because we are still in a phase where
we need to apply breaking changes in quick iterations. We can not do
this any more if they are not beta any more.

Even if it is technically possible to import a .json dump and turn it
into an RDF tripples dump, this is not what we do. We are using the
dumpRdf.php maintenance script to create the RDF dump, which is using
the code I pointed you to.

Maybe instead of asking if a specific solution exists, you can start
with explaining your problem, what you have, and what you would like
to achieve? I believe this opens more possibilities for people to
answer and help you.

Best
Thiemo

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] json to ttl

2017-07-03 Thread Thiemo Mättig
To what are you referring to when you say "beta"?

The code you are most probably looking for is in
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/

Best
Thiemo

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Removing sitelinks when they aren't being used

2017-02-25 Thread Thiemo Mättig
> I don't use sitelinks […] How can I stop these from being shown?

You can add the following line to your LocalSettings.php:

$wgWBRepoSettings['siteLinkGroups'] = [];

Best
Thiemo

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Searching through API and ignoring diacritics

2017-01-06 Thread Thiemo Mättig
Hey Miguel!

There are currently two search engines implemented. The one you use
when you start typing in the search box in the upper right corner is
currently based on a MySQL prefix search. We are actually working on
changing this, but this will take time.

When you are using Special:Search you are using an other search
algorithm that, I believe, supports what you want. You can try this on
wikidata.org: Typing "Comite" in the upper right will not find
"Comité", but Special:Search will.

You may need to install
https://www.mediawiki.org/wiki/Extension:CirrusSearch to have this
feature in your installation.

Best
Thiemo

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Two questions about Lexeme Modeling

2016-11-25 Thread Thiemo Mättig
Hi all!

I tweaked my part of the decision matrix a little bit:

https://docs.google.com/spreadsheets/d/1PtGkt6E8EadCoNvZLClwUNhCxC-cjTy5TY8seFVGZMY/edit?ts=5834219d#gid=868938568

The arguments in my matrix are basically a collection of "the worst
things that can happen". I like this approach. ;-)

The arguments I consider most important (they should have a high
number in the last column) are:

1. Changing Term to TermList later is almost impossible. This alone
could be set to a "-100" and make all the other arguments obsolete.

2. I'm very much concerned about any UI consuming Lemmas becoming very
complicated, both from the users and devs perspective. When a Lexeme
allows any number of Lemmas, should this include zero Lemmas? Which
language codes will be allowed? Do we want to enforce at least one
Lemma? Do we need to validate the used language codes, or are
post-edit checks enough? Do we even have standardized language codes
for all variants? Is it possible to have multiple Lemmas with the same
language code? Which Lemma is the primary one then? How to deprecate
one?

The list goes on.

All this sounds like we are going to reimplement the majority of the
statements UI, just without Ranks, Qualifiers and References.

Third-party devs will also have to deal with all these problems (also
see Dennys comments).

I suggest to use a TermList anyway, but to start with a very hard
limitation: It *must* contain exactly one element, and the language
code *must* be the exact same as the language code of the Lexeme. We
can lift all these limitations later when needed, step by step.

Best
Thiemo

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Why term for lemma?

2016-11-11 Thread Thiemo Mättig
Tpt asked:

> why having both the Term and the MonolingualText data structures? Is it just 
> for historical reasons (labels have been introduced before statements and so 
> before all the DataValue system) or is there an architectural reason behind?

That's not the only reason.

First, all data values (including monolingual text) must implement the
same DataValue interface.

Term must not implement anything (it does implement Comparable for convenience).

All DataValues share the same abstract DataValueObject base class. The
only reason for this is code sharing. No code should type hint against
DataValueObject (I just checked and hurray, we are clean).

MonolingualTextValue could indeed share code with Term. But it's not
possible to do "class MonolingualTextValue extends DataValueObject,
Term" in PHP. We would need to drop the code sharing with
DataValueObject and do "class MonolingualTextValue extends Term
implements DataValue" instead, which means we would have to copy all
the code from DataValueObject over to MonolingualTextValue. This is
entirely possible, but what would be the actual advantage of such a
change? Which code would benefit from being able to pass
MonoLingualValue's to code that accepts Term's?

Best
Thiemo

-- 
Thiemo Mättig
Software-Entwickler

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de

Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Adding reference re-adds claim

2016-04-06 Thread Thiemo Mättig
Hi,

the "definition" at Magnus'
http://tools.wmflabs.org/wikidata-todo/quick_statements.php is outdated.
Magnus, can you please remove the leading zeros from the year?

Padding years to 11 digits is not done any more. For a while there was no
padding at all in the backend, while some documentation talked about 16
digits and the frontend still padded to 11 digits. All at the same time.
:-( We fixed this about a year ago and decided to always pad years to 4
digits because this minimizes storage space while being the most convenient
strategy for users.

Best
Thiemo
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] wbc_entity_usage

2016-03-29 Thread Thiemo Mättig
Hi Magnus,

These "entity usage aspects" are described here:
https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/client/includes/Usage/EntityUsage.php

I'm not sure what you mean with "item id of the page". Which page?
eu_page_id is the page id where information from a Wikidata entity is used.
eu_entity_id is that Wikidata entity id.

Best
Thiemo
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Last call for objections against DataModel changes.

2015-12-07 Thread Thiemo Mättig
Thad Guidry wrote:

> EXPERIMENT MORE.

We had multiple actual implementations by multiple authors over the past
months, including:
* https://github.com/wmde/WikibaseDataModel/pull/508
* https://github.com/wmde/WikibaseDataModelSerialization/pull/162
* https://github.com/wmde/WikibaseDataModelSerialization/pull/163

Best

-- 
Thiemo Mättig
Software-Entwickler

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de

Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Globe coordinates precision question (technical)

2015-01-13 Thread Thiemo Mättig
Hi Markus,

 (1a) Wikibase will continue to support arbitrary precision values for
coordinates, and the UI will be extended so people can actually enter them.
 (1b) Wikibase will restrict the set of supported precision values for
coordinates to those already supported in the UI. Other values are
considered an error that will have to be fixed in the future.

In my opinion, possibly neither nor, with a tendency towards (a). Currently
the API accepts any number (which makes sense in my opinion, how should the
API provide a set of allowed precisions and why and how should it reject
certain numbers?). The UI supports an auto-detection and a selection of
predefined precisions, which is much easier to use. There may be an option
to enter the precision as a number, if requested, but I don't think this is
necessary at this point.

I recently introduced limits of 0.0001° (8 decimal places) and
00°00'00.01 to the precision auto-detection to work around IEEE rounding
issues (which happens both in- and externally). Both limits are equivalent
to approximately 1 mm which should be enough for anybody(tm).

There are not really hard limits when using the API. What is entered is
stored, which is how it should be in my opinion.

There is a hard limit of 1 in the formatters. Precisions bigger than 1 are
ignored and default to 1.

Rounding errors and IEEE issues in the precision do not matter. The
formatters calculate the number of significant decimal places from the
precision (which is basically a type of rounding to either a fraction of a
degree, minute or second smaller than the precision, depending on the
output format). When parsing this formatted string the internal IEEE
representation may change, but this possible loss is a one time thing,
does not sum up and is irrelevant for the displayed string and equality
checks (if they are done right).

 (2a) Null values for precision are an error that should be fixed in the
data. Wikibase will reject such data in the future.
 (2b) Null values for precision have a meaning. It is as follows (please
explain): ...

We currently have null values in the database. I tend to think of them as
not yet entered. I'm not sure if we should reject this at any point, I
prefer to apply the auto-detection instead (so the answer is, again,
neither nor).

 this was added only last November.

There always was a fall back to 1/3600° if no precision was given, but that
code was incomplete. If a coordinate with no precision made it to the
database you could not see, edit and fix it. This is possible now. Instead
of applying the auto-detection in the formatter (which would be possible
but may be confusing and inconsistent) the output defaults to the most
common DD°MM'SS (a.k.a. 1/3600°).

There are quite a lot of edge cases. I already fixed a lot of them (and
added tests to make sure they never break) and will happily add and fix
more. Just tell me if you find one.

Best

-- 
Thiemo Mättig
Software-Entwickler

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de

Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech