Re: [Wikidata-tech] json to ttl
The RDF dumps are marked as BETA because we are still in a phase where we need to apply breaking changes in quick iterations. We can not do this any more if they are not beta any more. Even if it is technically possible to import a .json dump and turn it into an RDF tripples dump, this is not what we do. We are using the dumpRdf.php maintenance script to create the RDF dump, which is using the code I pointed you to. Maybe instead of asking if a specific solution exists, you can start with explaining your problem, what you have, and what you would like to achieve? I believe this opens more possibilities for people to answer and help you. Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] json to ttl
To what are you referring to when you say "beta"? The code you are most probably looking for is in https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/ Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Removing sitelinks when they aren't being used
> I don't use sitelinks […] How can I stop these from being shown? You can add the following line to your LocalSettings.php: $wgWBRepoSettings['siteLinkGroups'] = []; Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Searching through API and ignoring diacritics
Hey Miguel! There are currently two search engines implemented. The one you use when you start typing in the search box in the upper right corner is currently based on a MySQL prefix search. We are actually working on changing this, but this will take time. When you are using Special:Search you are using an other search algorithm that, I believe, supports what you want. You can try this on wikidata.org: Typing "Comite" in the upper right will not find "Comité", but Special:Search will. You may need to install https://www.mediawiki.org/wiki/Extension:CirrusSearch to have this feature in your installation. Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Two questions about Lexeme Modeling
Hi all! I tweaked my part of the decision matrix a little bit: https://docs.google.com/spreadsheets/d/1PtGkt6E8EadCoNvZLClwUNhCxC-cjTy5TY8seFVGZMY/edit?ts=5834219d#gid=868938568 The arguments in my matrix are basically a collection of "the worst things that can happen". I like this approach. ;-) The arguments I consider most important (they should have a high number in the last column) are: 1. Changing Term to TermList later is almost impossible. This alone could be set to a "-100" and make all the other arguments obsolete. 2. I'm very much concerned about any UI consuming Lemmas becoming very complicated, both from the users and devs perspective. When a Lexeme allows any number of Lemmas, should this include zero Lemmas? Which language codes will be allowed? Do we want to enforce at least one Lemma? Do we need to validate the used language codes, or are post-edit checks enough? Do we even have standardized language codes for all variants? Is it possible to have multiple Lemmas with the same language code? Which Lemma is the primary one then? How to deprecate one? The list goes on. All this sounds like we are going to reimplement the majority of the statements UI, just without Ranks, Qualifiers and References. Third-party devs will also have to deal with all these problems (also see Dennys comments). I suggest to use a TermList anyway, but to start with a very hard limitation: It *must* contain exactly one element, and the language code *must* be the exact same as the language code of the Lexeme. We can lift all these limitations later when needed, step by step. Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Why term for lemma?
Tpt asked: > why having both the Term and the MonolingualText data structures? Is it just > for historical reasons (labels have been introduced before statements and so > before all the DataValue system) or is there an architectural reason behind? That's not the only reason. First, all data values (including monolingual text) must implement the same DataValue interface. Term must not implement anything (it does implement Comparable for convenience). All DataValues share the same abstract DataValueObject base class. The only reason for this is code sharing. No code should type hint against DataValueObject (I just checked and hurray, we are clean). MonolingualTextValue could indeed share code with Term. But it's not possible to do "class MonolingualTextValue extends DataValueObject, Term" in PHP. We would need to drop the code sharing with DataValueObject and do "class MonolingualTextValue extends Term implements DataValue" instead, which means we would have to copy all the code from DataValueObject over to MonolingualTextValue. This is entirely possible, but what would be the actual advantage of such a change? Which code would benefit from being able to pass MonoLingualValue's to code that accepts Term's? Best Thiemo -- Thiemo Mättig Software-Entwickler Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. (030) 219 158 26-0 http://wikimedia.de Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen Wissens frei teilhaben kann. Helfen Sie uns dabei! http://spenden.wikimedia.de/ Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Adding reference re-adds claim
Hi, the "definition" at Magnus' http://tools.wmflabs.org/wikidata-todo/quick_statements.php is outdated. Magnus, can you please remove the leading zeros from the year? Padding years to 11 digits is not done any more. For a while there was no padding at all in the backend, while some documentation talked about 16 digits and the frontend still padded to 11 digits. All at the same time. :-( We fixed this about a year ago and decided to always pad years to 4 digits because this minimizes storage space while being the most convenient strategy for users. Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] wbc_entity_usage
Hi Magnus, These "entity usage aspects" are described here: https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/client/includes/Usage/EntityUsage.php I'm not sure what you mean with "item id of the page". Which page? eu_page_id is the page id where information from a Wikidata entity is used. eu_entity_id is that Wikidata entity id. Best Thiemo ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Last call for objections against DataModel changes.
Thad Guidry wrote: > EXPERIMENT MORE. We had multiple actual implementations by multiple authors over the past months, including: * https://github.com/wmde/WikibaseDataModel/pull/508 * https://github.com/wmde/WikibaseDataModelSerialization/pull/162 * https://github.com/wmde/WikibaseDataModelSerialization/pull/163 Best -- Thiemo Mättig Software-Entwickler Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. (030) 219 158 26-0 http://wikimedia.de Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen Wissens frei teilhaben kann. Helfen Sie uns dabei! http://spenden.wikimedia.de/ Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Globe coordinates precision question (technical)
Hi Markus, (1a) Wikibase will continue to support arbitrary precision values for coordinates, and the UI will be extended so people can actually enter them. (1b) Wikibase will restrict the set of supported precision values for coordinates to those already supported in the UI. Other values are considered an error that will have to be fixed in the future. In my opinion, possibly neither nor, with a tendency towards (a). Currently the API accepts any number (which makes sense in my opinion, how should the API provide a set of allowed precisions and why and how should it reject certain numbers?). The UI supports an auto-detection and a selection of predefined precisions, which is much easier to use. There may be an option to enter the precision as a number, if requested, but I don't think this is necessary at this point. I recently introduced limits of 0.0001° (8 decimal places) and 00°00'00.01 to the precision auto-detection to work around IEEE rounding issues (which happens both in- and externally). Both limits are equivalent to approximately 1 mm which should be enough for anybody(tm). There are not really hard limits when using the API. What is entered is stored, which is how it should be in my opinion. There is a hard limit of 1 in the formatters. Precisions bigger than 1 are ignored and default to 1. Rounding errors and IEEE issues in the precision do not matter. The formatters calculate the number of significant decimal places from the precision (which is basically a type of rounding to either a fraction of a degree, minute or second smaller than the precision, depending on the output format). When parsing this formatted string the internal IEEE representation may change, but this possible loss is a one time thing, does not sum up and is irrelevant for the displayed string and equality checks (if they are done right). (2a) Null values for precision are an error that should be fixed in the data. Wikibase will reject such data in the future. (2b) Null values for precision have a meaning. It is as follows (please explain): ... We currently have null values in the database. I tend to think of them as not yet entered. I'm not sure if we should reject this at any point, I prefer to apply the auto-detection instead (so the answer is, again, neither nor). this was added only last November. There always was a fall back to 1/3600° if no precision was given, but that code was incomplete. If a coordinate with no precision made it to the database you could not see, edit and fix it. This is possible now. Instead of applying the auto-detection in the formatter (which would be possible but may be confusing and inconsistent) the output defaults to the most common DD°MM'SS (a.k.a. 1/3600°). There are quite a lot of edge cases. I already fixed a lot of them (and added tests to make sure they never break) and will happily add and fix more. Just tell me if you find one. Best -- Thiemo Mättig Software-Entwickler Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. (030) 219 158 26-0 http://wikimedia.de Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen Wissens frei teilhaben kann. Helfen Sie uns dabei! http://spenden.wikimedia.de/ Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech