Re: [Wikidata-tech] Missing documentation of Wikibase Lexeme data model
Am 11.12.18 um 10:38 schrieb Antonin Delpeuch (lists): > One way to generate a JSON schema would be to use Wikidata-Toolkit's > implementation, which can generate a JSON schema via Jackson. It could > be used to validate the entire data model. Why a schema is nice, it's more important to have documentation that defines the contract - that is, the intended semantics and guarantees. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Missing documentation of Wikibase Lexeme data model
Am 11.12.18 um 08:38 schrieb Jakob Voß:> Hi, > > I just noted that the official description of the Wikibase data model at > > https://www.mediawiki.org/wiki/Wikibase/DataModel > > and the description of JSON serialization lack a description of Lexemes, > Forms, > and Senses. The abstract model for Lexemes is here: https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model The RDF binding his here: https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/RDF_mapping Looks like documentation for the JSON bindinng is indeed missing. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] lexeme fulltext search display
Am 18.06.2018 um 19:25 schrieb Stas Malyshev: > 1. What the link will be pointing to? I haven't found the code to > generate the link to specific Form. You can use an EntityTitleLookup to get the Title object for an EntityId. In case of a Form, it will point to the appropriate section. You can use the LinkRenderer service to make a link. Or you use an EntityIdHtmlLinkFormatter, which should do the right thing. You can get one from a OutputFormatValueFormatterFactory. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] lexeme fulltext search display
Hi Stas! Your proposal is pretty much what I envision. Am 14.06.2018 um 19:39 schrieb Stas Malyshev: > I plan to display Lemma match like this: > > title (LN) > Synthetic description > > e.g. > > color/colour (L123) > English noun > > Meaning, the first line with link would be standard lexeme link > generated by Lexeme code (which also deals with multiple lemmas) and the > description line is generated description of the Lexeme - just like in > completion search. Sounds perfect to me. > The problem here, however, is since the link is > generated by the Lexeme code, which has no idea about search, we can not > properly highlight it. This can be solved with some trickery, probably, > e.g. to locate search matches inside generated string and highlight > them, but first I'd like to ensure this is the way it should be looking. Do we really need the highlight? It does not seem critical to me for this use case. Just "nice to have". > More tricky is displaying the Form (representation) match. I could > display here the same as above, but I feel this might be confusing. > Another option is to display Form data, e.g. for "colors": > > color/colour (L123) > colors: plural for color (L123): English noun I'd rather have this: colors/colours (L123-F2) plural of color (L123): English noun Note that in place of "plural", you may have something like "3rd person, singular, past, conjunctive", derived from multiple Q-ids. > The description line features matched Form's representation and > synthetic description for this form. Right now the matched part is not > highlighted - because it will otherwise always be highlighted, as it is > taken from the match itself, so I am not sure whether it should be or not. Again, I don't think any highlighting is needed. But, as you know, it's all up to Lydia to decide :) -- daniel -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Fastest way (API or whatever) to verify a QID
You can do this via the API, e.g.: https://www.wikidata.org/w/api.php?action=query==json=Q1|Qx|Q1003|Q66=1 Note that this uses QIDs directy as page titles. This works on wikidata, but may not work on all wikibase instances. It also does not work for PIDs: for these, you have to prefix the Property namespace, as in Property:P31. A more wikibase way would be to use the wbgetentities API, as in https://www.wikidata.org/w/api.php?action=wbgetentities=Q42|Q64= However, this API fails when you proivde a non-existing ID, without providing any information about other IDs. So you can quickly check if all the IDs you have are ok, but you may need several calls to get a list of all the bad IDs. That's rather annoying for your use case. Feel free to file a ticket on phabricator.wikimedia.org. Use the wikidata tag. Tahnks! -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Search on Wikibase/Wikidata sans CirrusSearch?
Yes, it's supposed to work, see FingerprintSearchTextGenerator and EntityContent::getTextForSearchIndex Am 30.12.2017 um 06:47 schrieb Stas Malyshev: > Hi! > > I wonder if anybody have run/is running Wikibase without CirrusSearch > installed and whether the fulltext search is supposed to work in that > configuration? The suggester/prefix search, aka wbsearchentities, works > ok, but I can't make fulltext aka Special:Search find anything on my VM > (which very well may be a consequence of me messing up, or some bug, or > both :) > So, I wonder - is it *supposed* to be working? Is anybody using it this > way and does anybody care for such a use case? > > Thanks, > ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Does a rollback also roll back revision history?
Am 31.07.2017 um 17:01 schrieb Eric Scott: > * Is is indeed the case that rollbacks also roll back the revision history? No. All edits are visible in the page history, including rollback, revert, restore, undo, etc. The only kind of edit that is not recorded is a "null edit" - an edit that changes nothing compared to the previous version (so it's not actually an edit). This is sometimes used to rebuild cached derived data. > * Is there some other place we could look that records such rollbacks? No. The page history is authoritative. It reflects all changes to the page content. If you could find a way to trigger this kind of behavior, that would be a HUGE bug. Let us know. Note that for wikitext content, this doesn't mean that it contains all changes to the visible rendering: when a transcluded template is changed, this changes the rendering, but is not visible in the page's history (but it is instead visible in the template's history). However, no transclusion mechanism exists for Wikidata entities. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Wikibase and PostgreSQL
Hi Denis! Sorry for the late response. The information is in the installation requirements, see <https://www.mediawiki.org/wiki/Extension:Wikibase_Repository#Requirements>. Where did you expect to find it? Perhaps we can add it in some more places to avoid confusion and frustration. In the README file, maybe? -- daniel Am 06.03.2017 um 09:05 schrieb Denis Rykov: > Hello! > > It looks like Wikibase extension is not compatible with PostgreSQL backend. > There are many MySQL specific code in sql scripts (e.g. auto_increment, > varbinary). > How about to add this information to Wikibase docs? > // -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Wikidata] Significant change: new data type for geoshapes
Am 29.03.2017 um 15:19 schrieb Luca Martinelli: >> One thing to note: We currently do not export statements that use this >> datatype to RDF. They can therefore not be queried in the Wikidata Query >> Service. The reason is that we are still waiting for geoshapes to get stable >> URIs. This is handled in this ticket. This ticket: <https://phabricator.wikimedia.org/T159517>. And more generally <https://phabricator.wikimedia.org/T161527>. The technically inclined of you may be interested in joining the relevant RFC discussion on IRC tonight at 21:00 UTC (2pm PDT, 23:00 CEST) #wikimedia-office. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Two questions about Lexeme Modeling
Am 25.11.2016 um 12:16 schrieb David Cuenca Tudela: >> If we want to avoid this complexity, we could just go by prefix. So if the >> languages is "de", variants like "de-CH" or "de-DE_old" would be considered >> ok. >> Ordering these alphabetically would put the "main" code (with no suffix) >> first. >> May be ok for a start. > > I find this issue potentially controversial, and I think that the community at > large should be involved in this matter to avoid future dissatisfaction and to > promote involvement in the decision-making. We should absolutely discuss this with Wiktionarians. My suggestion was intended as a baseline implementation. Details about the restrictions on which variants are allowed on a Lexeme, or in what order they are shown, can be changed later without breaking anything. > In my opinion it would be more appropriate to use standardized language codes, > and then specify the dialect with an item, as it provides greater flexibility. > However, as mentioned before I would prefer if this topic in particular would > be > discussed with wiktionarians. Using Items to represent dialects is going to be tricky. We need ISO language codes for use in HTML and RDF. We can somehow map between Items and ISO codes, but that's going to be messy, especially when that mapping changes. So it seems like we need to further discuss how to represent a Lexeme's language and each lemma's variant. My current thinking is to represent the language as an Item reference, and the variant as an ISO code. But you are suggesting the opposite. I can see why one would want items for dialects, but I currently have no good idea for making this work with the existing technology. Further investigation is needed. I have filed a Phabricator task for investiagting this. I suggest to take the discussion about how to represent languages/variants/dialects/etc there: https://phabricator.wikimedia.org/T151626 -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Two questions about Lexeme Modeling
Thank you Denny for having an open mind! And sorry for being a nuisance ;) I think it's very important to have controversial but constructive discussions about these things. Data models are very hard to change even slightly once people have started to create and use the data. We need to try hard to get it as right as possible off the bat. Some remarks inline below. Am 25.11.2016 um 03:32 schrieb Denny Vrandečić: > There is one thing that worries me about the multi-lemma approach, and that > are > mentions of a discussion about ordering. If possible, I would suggest not to > have ordering in every single Lexeme or even Form, but rather to use the > following solution: > > If I understand it correctly, we won't let every Lexeme have every arbitrary > language anyway, right? Instead we will, for each language that has variants > have somewhere in the configurations an explicit list of these variants, i.e. > say, for English it will be US, British, etc., for Portuguese Brazilian and > Portuguese, etc. That approach is similar to what we are now doing for sorting Statement groups on Items. There is a global ordering of properties defined on a wiki page. So the community can still fight over it, but only in one place :) We can re-order based on user preference using a Gadget. For the multi-variant lemmas, we need to declare the Lexeme's language separately, in addition to the language code associated with each lemma variant. It seems like the language will probably represented as reference to a Wikidata Item (that is, a Q-Id). That Item can be associated with an (ordered) list of matching language codes, via Statements on the Item, or via configuration (or, like we do for unit conversion, configuration generated from Statements on Items). If we want to avoid this complexity, we could just go by prefix. So if the languages is "de", variants like "de-CH" or "de-DE_old" would be considered ok. Ordering these alphabetically would put the "main" code (with no suffix) first. May be ok for a start. I'm not sure yet on what level we want to enforce the restriction on language codes. We can do it just before saving new data (the "validation" step), or we could treat it as a community enforced soft constraint. I'm tending towards the former, though. > Given that, we can in that very same place also define their ordering and > their > fallbacks. Well, all lemmas would fall back on each other, the question is just which ones should be preferred. Simple heuristic: prefer the shortest language code. Or go by what MediaWiki does fro the UI (which is what we do for Item labels). > The upside is that it seems that this very same solution could also be used > for > languages with different scripts, like Serbian, Kazakh, and Uzbek (although it > would not cover the problems with Chinese, but that wasn't solved previously > either - so the situation is strictly better). (It doesn't really solve all > problems - there is a reason why ISO treats language variants and scripts > independently - but it improves on the vast majority of the problematic > cases). Yes, it's not the only decision we have to make in this regard, but the most fundamental one, I think. One consequence of this is that Forms should probably also allow multiple representations/spellings. This is for consistency with the lemma, for code re-use, and for compatibility with Lemon. > So, given that we drop any local ordering in the UI and API, I think that > staying close to Lemon and choosing a TermList seems currently like the most > promising approach to me, and I changed my mind. Knowing that you won't do that without a good reason, I thank you for the compliment :) > My previous reservations still > hold, and it will lead to some more complexity in the implementation not only > of > Wikidata but also of tools built on top of it, The complexity of handling a multi-variant lemma is higher than a single string, but any wikibase client already needs to have the relevant code anyway, to handle item labels. So I expect little overhead. We'll want the lemma to be represented in a more compact way in the UI than we currently use for labels, though. Thank you all for your help! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Two questions about Lexeme Modeling
Am 12.11.2016 um 00:08 schrieb Denny Vrandečić: > I am not a friend of multi-variant lemmas. I would prefer to either have > separate Lexemes or alternative Forms. We have created a decision matrix to help with discussing the pros and cons of the different approaches. PLease have a look and comment: https://docs.google.com/spreadsheets/d/1PtGkt6E8EadCoNvZLClwUNhCxC-cjTy5TY8seFVGZMY/edit?ts=5834219d#gid=0 -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Two questions about Lexeme Modeling
y is creaking and not working well, and then think about > these issues. Slow iteration is nice as long as you don't produce artifact you need to stay compatible with. I have become extremely wary of lock-in - Wikitext is the worst lock-in I have ever seen. Some aspects of how we implemented the Wikibase model for Wikidata also have proven to be really hard to iterate on. Iterating the model itself is even harder, since it is bound to break all clients in a fundamental way. We just got very annoyed comments just for making two fields in the Wikibase model optional. Switching from single-lemma to multi-lemma would be a major breaking change, with lots of energy burned on backwards compatibility. The opposite switch would be much simpler (because it adds guarantees, instead of removing them). > But until then I would prefer to keep the system as dumb and > simple as possible. I would prefer to keep the user generated *data* as straight forward as possible. That's more important to me than a simple meta-model. The complexity of the instance data determines the maintenance burden. Am 20.11.2016 um 21:06 schrieb Philipp Cimiano: > Please look at the final spec of the lemon model: > > https://www.w3.org/community/ontolex/wiki/Final_Model_Specification#Syntactic_Frames > > In particular, check example: synsem/example7 Ah, thank you! I think we could model this in a similar way, by referencing an Item that represents a (type of) frame from the Sense. Whether this should be a special field or just a Statement I'm still undecided on. Is it correct that in the Lemon model, it's not *required* to define a syntactic frame for a sense? Is there something like a default frame? > 2) Such spelling variants are modelled in lemon as two different > representations > of the same lexical entry. [...] > In our understanding these are not two different forms as you mention, but two > different spellings of the same form. Indeed, sorry for being imprecise. And yes, if we have a multi-variant lemma, we should also have multi-variant Forms. Our lemma corresponds to the canonical form in Lemon, if I understand correctly. > The preference for showing e.g. the American or English variant should be > stated by the application that uses the lexicon. I agree. I think Denny is concerned with putting that burden on the application. Proper language fallback isn't trivial, and the application may be a light weight JS library... But I think for the naive case, it's fine to simply show all representations. Thank you all for your input! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Linking RDF resources for external IDs
By the way, I'm also re-considering my original approach: Simply replace the plain value with the resolved URI when we can. This would *not* cause the same property to be used with literals and non-literals, since the predicate name is derived from the proeprty ID, and a property either provides a URI mapping, or it doesn't. Problems would arise during transition, making this a breaking change: 1) when introducing this feature, existing queries that compare a newly URI-ified property to a string literal will fail. 2) when a URI mapping is added, we'd either need to immediately update all statements that use that property, or the triple store would have some old triples where the relevant predicates point to a literal, and some new triples where it pints to a resource. This would avoid duplicating more predicates, and keeps the model straight forward. But it would cause a bumpy transition. Please let me know which approach you prefer. Have a look at the files attached to my original message. Thanks, Daniel Am 09.11.2016 um 17:46 schrieb Daniel Kinzler: > Hi Stas, Markus, Denny! > > For a long time now, we have been wanting to generate proper resource > references > (URIs) for external identifier values, see > <https://phabricator.wikimedia.org/T121274>. > > Implementing this is complicated by the fact that "expanded" identifiers may > occur in four different places in the data model (direct, statement, > qualifier, > reference), and that we can't simply replace the old string value, we need to > provide an additional value. > > I have attached three files with snippets of three different RDF mappings: > - Q111.ttl - the status quo, with normalized predicates declared but not used. > - Q111.rc.ttl - modeling resource predicates separately from normalized > values. > - Q111.norm.ttl - modeling resource predicates as normalized values. > > The "rc" variant means more overhead, the "norm" variant may have semantic > difficulties. Please look at the two options for the new mapping and let me > know > which you like best. You can use a plain old diff between the files for a > first > impression. > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Two questions about Lexeme Modeling
Hi all! There is two questions about modelling lexemes that are bothering me. One is an old question, and one I only came across recently. 1) The question that came up for me recently is how we model the grammatical context for senses. For instance, "to ask" can mean requesting information, or requesting action, depending on whether we use "ask somebody about" or "ask somebody to". Similarly, "to shit" has entirely different meanings when used reflexively ("I shit myself"). There is no good place for this in our current model. The information could be placed in a statement on the word Sense, but that would be kind of non-obvious, and would not (at least not easily) allow for a concise rendering, in the way we see it in most dictionaries ("to ask sbdy to do sthg"). The alternative would be to treat each usage with a different grammatical context as a separate Lexeme (a verb phrase Lexeme), so "to shit oneself" would be a separate lemma. That could lead to a fragmentation of the content in a way that is quite unexpected to people used to traditional dictionaries. We could also add this information as a special field in the Sense entity, but I don't even know what that field should contain, exactly. Got a better idea? 2) The older question is how we handle different renderings (spellings, scripts) of the same lexeme. In English we have "color" vs "colour", in German we have "stop" vs "stopp" and "Maße" vs "Masse". In Serbian, we have a Roman and Cyrillic rendering for every word. We can treat these as separate Lexemes, but that would mean duplicating all information about them. We could have a single Lemma, and represent the others as alternative Forms, or using statements on the Lexeme. But that raises the question which spelling or script should be the "main" one, and used in the lemma. I would prefer to have multi-variant lemmas. They would work like the multi-lingual labels we have now on items, but restricted to the variants of a single language. For display, we would apply a similar language fallback mechanism we now apply when showing labels. 2b) if we treat lemmas as multi-variant, should Forms also be multi-variant, or should they be per-variant? Should the glosse of a Sense be multi-variant? I currently tend towards "yes" for all of the above. What do you think? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Why term for lemma?
Am 11.11.2016 um 14:38 schrieb Thiemo Mättig: > Tpt asked: > >> why having both the Term and the MonolingualText data structures? Is it just >> for historical reasons (labels have been introduced before statements and so >> before all the DataValue system) or is there an architectural reason behind? > > That's not the only reason. Besides the code perspective that Thiemo just explained, there is also the conceptual perspective: Terms are editorial information attached to an entity for search and display. DataValues such as MonolingualText represent a value withing a Statement, citing an external authority. This leads to slight differences in behavior - for instance, the set of languages available for Terms is suptly different from the set of languages available for MonolongualText. Anyway, the fact that the two are totally separate has historical reasons. One viable approach for code sharing would be to have MonolingualText contain a Term object. But that would introduce more coupling between our components. I don't think the little bit of code that could be shared is worth the effort. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Linking RDF resources for external IDs
Hi Stas, Markus, Denny! For a long time now, we have been wanting to generate proper resource references (URIs) for external identifier values, see <https://phabricator.wikimedia.org/T121274>. Implementing this is complicated by the fact that "expanded" identifiers may occur in four different places in the data model (direct, statement, qualifier, reference), and that we can't simply replace the old string value, we need to provide an additional value. I have attached three files with snippets of three different RDF mappings: - Q111.ttl - the status quo, with normalized predicates declared but not used. - Q111.rc.ttl - modeling resource predicates separately from normalized values. - Q111.norm.ttl - modeling resource predicates as normalized values. The "rc" variant means more overhead, the "norm" variant may have semantic difficulties. Please look at the two options for the new mapping and let me know which you like best. You can use a plain old diff between the files for a first impression. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix wikibase: <http://wikiba.se/ontology-beta#> . @prefix wdata: <http://localhost/daniel/wikidata/index.php/Special:EntityData/> . @prefix wd: <http://www.wikidata.org/entity/> . @prefix wds: <http://www.wikidata.org/entity/statement/> . @prefix wdref: <http://www.wikidata.org/reference/> . @prefix wdv: <http://www.wikidata.org/value/> . @prefix wdt: <http://www.wikidata.org/prop/direct/> . @prefix wdtn: <http://www.wikidata.org/prop/direct-normalized/> . @prefix p: <http://www.wikidata.org/prop/> . @prefix ps: <http://www.wikidata.org/prop/statement/> . @prefix psv: <http://www.wikidata.org/prop/statement/value/> . @prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/> . @prefix pq: <http://www.wikidata.org/prop/qualifier/> . @prefix pqv: <http://www.wikidata.org/prop/qualifier/value/> . @prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/> . @prefix pr: <http://www.wikidata.org/prop/reference/> . @prefix prv: <http://www.wikidata.org/prop/reference/value/> . @prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/> . @prefix wdno: <http://www.wikidata.org/prop/novalue/> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix schema: <http://schema.org/> . @prefix cc: <http://creativecommons.org/ns#> . @prefix geo: <http://www.opengis.net/ont/geosparql#> . @prefix prov: <http://www.w3.org/ns/prov#> . wd:Q111 a wikibase:Item ; rdfs:label "silver"@en ; skos:prefLabel "silver"@en ; schema:name "silver"@en ; wdt:P20 "asdfasdf" ; wdtn:P20 <http://musicbrainz.org/asdfasdf/place> . wd:Q111 p:P20 wds:Q111-5459c580-4b6f-c306-184f-b7fa132b32d8 . wds:Q111-5459c580-4b6f-c306-184f-b7fa132b32d8 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P20 "asdfasdf" ; psn:P20 <http://musicbrainz.org/asdfasdf/place> ; pq:P30 "qwertyqwerty" ; pqn:P30 <http://vocab.getty.edu/aat/qwertyqwerty> ; prov:wasDerivedFrom wdref:7335a5598064cd8716cc9e31d164f2803e376b99 . wdref:7335a5598064cd8716cc9e31d164f2803e376b99 a wikibase:Reference ; pr:P40 "zxcvbnzxcvbn" ; prn:P40 <https://www.sbfi.admin.ch/ontology/occupation/zxcvbnzxcvbn> . wd:P20 a wikibase:Property ; wikibase:propertyType <http://wikiba.se/ontology-beta#ExternalId> ; wikibase:directClaim wdt:P20 ; wikibase:directClaimNormalized wdtn:P20 ; wikibase:claim p:P20 ; wikibase:statementProperty ps:P20 ; wikibase:statementValue psv:P20 ; wikibase:statementValueNormalized psn:P20 ; wikibase:qualifier pq:P20 ; wikibase:qualifierValue pqv:P20 ; wikibase:qualifierValueNormalized pqn:P20 ; wikibase:reference pr:P20 ; wikibase:referenceValue prv:P20 ; wikibase:referenceValueNormalized prn:P20 ; wikibase:novalue wdno:P20 . p:P20 a owl:ObjectProperty . psv:P20 a owl:ObjectProperty . pqv:P20 a owl:ObjectProperty . prv:P20 a owl:ObjectProperty . psn:P20 a owl:ObjectProperty . pqn:P20 a owl:ObjectProperty . prn:P20 a owl:ObjectProperty . wdt:P20 a owl:DatatypeProperty . ps:P20 a owl:DatatypeProperty . pq:P20 a owl:DatatypeProperty . pr:P20 a owl:DatatypeProperty . wdtn:P20 a owl:ObjectProperty . wdno:P20 a owl:Class ; owl:complementOf _:genid2 . _:genid2 a owl:Restriction ; owl:onProperty wdt:P20 ; owl:someValuesFrom owl:Thing . wd:P20 rdfs:label "MusicBrainz place ID"@en . @prefix rdf: <http://www.w3.org/1999/02/22-rd
[Wikidata-tech] BREAKING CHANGE: Quantity Bounds Become Optional
Hi all! This is an announcement for a breaking change to the Wikidata API, JSON and RDF binding, to go live on 2016-11-15. It affects all clients that process quantity values. As Lydia explained in the mail she just sent to the Wikidata list, we have been working on improving our handling of quantity values. In particular, we are making upper- and lower bounds optional: When the uncertainty of a quantity measurement is not explicitly known, we no longer require the bounds to somehow be specified anyway, but allow them to be omitted. This means that the upperBound and lowerBound fields of quantity values become optional in all API input and output, as well as the JSON dumps and the RDF mapping. Clients that import quantities should now omit the bounds if they do not have explicit information on the uncertainty of a quantity value. Clients that process quantity values must be prepared to process such values without any upper and lower bound set. That is, instead of this "datavalue":{ "value":{ "amount":"+700", "unit":"1", "upperBound":"+710", "lowerBound":"+690" }, "type":"quantity" }, clients may now also encounter this: "datavalue":{ "value":{ "amount":"+700", "unit":"1" }, "type":"quantity" }, The intended semantics is that the uncertainty is unspecified if not bounds are present in the XML, JSON or RDF representation. If they are given, the interpretation is as before. For more information, see the JSON model documentation [1]. Note that quantity bounds have been marked as optional in the documentation since August. The RDF mapping spec [2] has been adjusted accordingly. This change is scheduled for deployment on November 15. Please let us know if you have any comments or objections. -- daniel [1] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON [2] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Quantity Relevant tickets: * <https://phabricator.wikimedia.org/T115269> Relevant patches: * <https://gerrit.wikimedia.org/r/#/c/302248> * <https://github.com/DataValues/Number/commit/2e126eee1c0067c6c0f35b4fae0388ff11725307> -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Why term for lemma?
Am 02.11.2016 um 21:53 schrieb Denny Vrandečić: > Hi, > > I am not questioning or criticizing, just curious - why was it decided to > implement lemmas as terms? I guess it is for code reuse purposes, but just > wanted to ask. Yes, ideed. We have code for rendering, serializing, indexing, and searching Terms. We do not have any infrastructure for plain strings. We could also handle it as a monolingual-text StringValue, but that offers less re-use, in particular no search, and no batch lookup for rendering. Also, conceptually, the lemma is rather similar to a label. And it's always *in* a language. The only question is whether we only have one, or multiple (for variants/scripts). But one will do for now. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Proposed update to the stable interfaces policy
Tomorrow I plan to apply the following update to the Stable Interface Policy: https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section Please comment there if you have any objections. Thanks! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Announcing the Wikidata Stable Interface Policy
Hello all! After a brief period for final comments (thanks everyone for your input!), the Stable Interface Policy is now official. You can read it here: <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy> This policy is intended to give authors of software that accesses Wikidata a guide to what interfaces and formats they can rely on, and which things can change without warning. The policy is a statement of intent given by us, the Wikidata development team, regarding the software running on the site. It does not apply to any content maintained by the Wikidata community. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Policy on Interface Stability: final feedback wanted
Hello all, repeated discussions about what constitutes a breaking change has prompted us, the Wikidata development team, to draft a policy on interface stability. The policy is intended to clearly define what kind of change will be announced when and where. A draft of the policy can be found at <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy> Please comment on the talk page. Note that this policy is not about the content of the Wikidata site, it's a commitment by the development team regarding the behavior of the software running on wikidata.org. It is intended as a reference for bot authors, data consumers, and other users of our APIs. We plan to announce this as the development team's official policy on Monday, August 22. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] URL strategy
Am 13.06.2016 um 12:12 schrieb Richard Light: > returns a list of person URLs. So I'm happy. However, I am still intrigued > as > to the logic behind the redirection of the statement URL to the URL for the > person about whom the statement is being made. The reason is a practical one: the statement data is part of the data about that person. It's stored and addressed as part of that person's information. We currently do not have an API that would return only the statement data itself, so if you dereference the statement URI, you get all the data we have on the subject, which includes the statement. This is formally acceptable: dereferencing the statement URI should give you the RDF representation of that statement (and possibly more - which is the case here). The statement URI does not resolve to the the subject or the object, but to Statement itself, which is an RFC resource in it's own right. Perhaps the confusion arises from the fact that the SPARQL endpoint offers two views on Statements: the "direct" or "naive" mapping (using the wds prefix) in which a Statement is modeled as a single triple, and does not have a URI of it's own. And the "full" or "deep" mapping, where the statement is a resource in it's own right, and we use several triples to describe its type, value, rank, qualifiers, references, etc. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] MathML is dead, long live MathML
Am 07.04.2016 um 20:00 schrieb Moritz Schubotz: > Hi Daniel, > > Ok. Let's discuss! Great! But let's keep the discussion in one place. I made a mess by cross-posting this to two lists, now it's three, it seems. Can we agree onas the venue of discussion? At least for the discussion of MathML in the context of Wikimedia, that would be the best place, I think. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] MathML is dead, long live MathML
Peter Krautzberger, maintainer of MathJax, apparently thinks that MathML has failed as a web standard (even though it succeeded as an XML standard), and should be removed from HTML5. Here's the link: https://www.peterkrautzberger.org/0186/ It's quite a rant. Here's a quick TL;DR: > It doesn’t matter whether or not MathML is a good XML language. Personally, I > think it’s quite alright. It’s also clearly a success in the XML publishing > world, serving an important role in standards such as JATS and BITS. > > The problem is: MathML has failed on the web. > Not a single browser vendor has stated an intent to work on the code, not a > single browser developer has been seen on the MathWG. After 18 years, not a > single browser vendor is willing to dedicate even a small percentage of a > developer to MathML. > Math layout can and should be done in CSS and SVG. Let’s improve them > incrementally to make it simpler. > > It’s possible to generate HTML+CSS or SVG that renders any MathML content – > on the server, mind you, no client-side JS required (but of course possible). > Since layout is practically solved (or at least achievable), we really need > to solve the semantics. Presentation MathML is not sufficient, Content MathML > is just not relevant. > > We need to look where the web handles semantics today – that’s ARIA and HTML > but also microdata, rdfa etc. I think both, the rendering as well as the semantics, are well worth thinking about. Perhaps Wikimedia should reach out to Peter Krautzberger, and discuss some ideas of how math (and physics, and chemistry) content should be handled by Wikipedia, Wikidata, and friends. This seems like a cross roads, and we should have a hand in where things are going from here. -- daniel (not a MathML expert all all) ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Caches for Special:EntityData json
Output from Special:EntityData is cached for 31 days. Looking at the code, it seems we are not automatically purging the web caches when an entity is edited - please file a ticket for that. I think we originally decided against it for performance reasons (there are quite a few URLs to purge for every edit), but I suppose we should look into that again. You can force the cache to be purged by setting action=purge in the request. Note that this will purge all serializations of the entity, not just the one requested. -- daniel Am 29.02.2016 um 22:02 schrieb Markus Krötzsch: > Hi, > > I found that Special:EntityData returns outdated JSON data that is not in > agreement with the page. I have fetched the data using wget to ensure that no > browser cache is in the way. Concretely, I have been looking at > > https://www.wikidata.org/wiki/Special:EntityData/Q17444909.json > > where I recently changed the P279 value from Q217594 to Q16889133. Of course, > this might no longer be a valid example when you read this email (in case the > cache gets updated at some point). > > Is this a bug in the configuration of the HTTP (or other) cache, or is this > the > desired behaviour? When will the cache be cleared? > > Thanks, > > Markus > > ___ > Wikidata-tech mailing list > Wikidata-tech@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Wikibase CI is broken because auf Scribunto issue
Some Jenkins jobs now fail for all changes to Wikibase. E.g. <https://gerrit.wikimedia.org/r/#/c/270008/> and <https://gerrit.wikimedia.org/r/#/c/270572/>. Errors I see: 11:28:52 PHP Strict standards: Declaration of Capiunto\Test\BasicRowTest::testLua() should be compatible with Scribunto_LuaEngineTestBase::testLua($key, $testName, $expected) in /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer/src/extensions/Capiunto/tests/phpunit/output/BasicRowTest.php on line 51 11:39:14 1) LuaSandbox: Wikibase\Client\Tests\DataAccess\Scribunto\Scribunto_LuaWikibaseEntityLibraryTest::testRegister 11:39:14 Failed asserting that LuaSandboxFunction Object () is an instance of class "Scribunto_LuaStandaloneInterpreterFunction". I guess some change to Scribunto broke compatibility... -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Technical information about the new "math" and "external-id" data types
As Lydia announced, we are going to deploy support for two new data types soon (think of "data types" as "property types", as opposed to "value types"): * The "math" type for formulas. This will use TeX syntax and is provided by the same extension that implements for wikitext. We plan to roll this out on Feb 9th. * The "external-id" type for references to external resources. We plan to roll this out on Feb 16th. NOTE: Many of the existing properties for external identifiers will be converted from the plain "string" data type to the new "external-id" data type, see <https://www.wikidata.org/wiki/User:Addshore/Identifiers>. Both these new types will use the "string" value type. Below are two examples of Snaks that use the new data type, in JSON: { "snaktype": "value", "property": "P717", "datavalue": { "value": "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}", "type": "string" }, "datatype": "math" } { "snaktype": "value", "property": "P708", "datavalue": { "value": "BADWOLF", "type": "string" }, "datatype": "external-id" } As you can see, the only thing that is new is the value of the "datatype" field. Similarly, in RDF, both new data types use plain string literals for now, as you can see from the turtle snippet below: wd:Q2209 a wikibase:Item ; wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}" ; wdt:P708 "BADWOLF" . The datatypes themselves are declared as follows: wd:P708 a wikibase:Property ; wikibase:propertyType wikibase:ExternalId . wd:P717 a wikibase:Property ; wikibase:propertyType wikibase:Math . Accordingly, the URIs of the datatypes (not the types of the literals!) are: <http://wikiba.se/ontology-beta#ExternalId> <http://wikiba.se/ontology-beta#Math> These are, for now, the only changes to the representation of Snaks. We do however consider some additional changes for the future. To avoid confusion, I'll put them below a big separator: ANNOUNCEMENT ABOVE! ROUGH PLANS BELOW! Here are some changes concerning the math and external-id data types that we are considering or planning for the future. * For the Math datatype, we may want to provide a type URI for the RDF string literal that indicates that the format is indeed TeX. Perhaps we could use <http://purl.org/xtypes/Fragment-LaTeX>. * For the ExternalId data type, we would like to use resource URIs for external IDs (in "direct claims"), if possible. This would only work if we know the base URI for the property (provided by a statement on the property definition). For properties with no base URI set, we would still use plain string literals. In our example above, the base URI for P708 might be <https://tardis.net/allonzy/>. The Turtle snippet would read: wd:Q2209 a wikibase:Item ; wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}" ^^purl:Fragment-LaTeX; wdt:P708 <https://tardis.net/allonzy/BADWOLF> . However, the full representation of the statement would still use the original string literal: wds:Q2209-24942a17-4791-a49d-6469-54e581eade55 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P708 "BADWOLF" . We would also like to provide the full URI of the external resource in JSON, making us a good citizen of the web of linked data. We plan to do this using a mechanism we call "derived values", which we also plan to use for other kinds of normalization in the JSON output. The idea is to include additional data values in the JSON representation of a Snak: { "snaktype": "value", "property": "P708", "datavalue": { "value": "BADWOLF", "type": "string" }, "datavalue-uri": { "value": "https://tardis.net/allonzy/BADWOLF;, "type": "string" }, "datatype": "external-id" } In some cases, such as ISBNs, we would want a URL as well as a URI: { "snaktype": "value", "property"
[Wikidata-tech] On interface stability and forward compatibility
Hi all! In the context of introducing the new "math" and "external-id" data types, the question came up whether this introduction constitutes a breaking change to the data model. The answer to this depends on whether you take the "English" or the "German" approach to interpreting the format: According to <https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed>, in England, "everything which is not forbidden is allowed", while, in Germany, the opposite applies, so "everything which is not allowed is forbidden". In my mind, the advantage of formats like JSON, XML and RDF is that they provide good discovery by eyeballing, and that they use a mix-and-match approach. In this context, I favour the English approach: anything not explicitly forbidden in the JSON or RDF is allowed. So I think clients should be written in a forward-compatible way: they should handle unknown constructs or values gracefully. In this vein, I would like to propose a few guiding principles for the design of client libraries that consume Wikibase RDF and particularly JSON output: * When encountering an unknown structure, such as an unexpected key in a JSON encoded object, the consumer SHOULD skip that structure. Depending on context and use case, a warning MAY be issued to alert the user that some part of the data was not processed. * When encountering a malformed structure, such as missing a required key in a JSON encoded object, the consumer MAY skip that structure, but then a warning MUST be issued to alert the user that some part of the data was not processed. If the structure is not skipped, the consumer MUST fail with a fatal error. * Clients MUST make a clear distinction of data types and values types: A Snak's data type determines the interpretation of the value, while the type of the Snak's data value specifies the structure of the value representation. * Clients SHOULD be able to process a Snak about a Property of unknown data type, as long as the value type is known. In such a case, the client SHOULD fall back to the behaviour defined for the value type. If this is not possible, the Snak MUST be skipped and a warning SHOULD be issued to alert the user that some part of the data could not be interpreted. * When encountering an unknown type of data value (value type), the client MUST either ignore the respective Snak, or fail with a fatal error. A warning SHOULD be issued to alert the user that some part of the data could not be processed. Do you think these guidelines are reasonable? It seems to me that adopting them should save everyone some trouble. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] On interface stability and forward compatibility
Am 05.02.2016 um 14:55 schrieb Tom Morris: > Sounds a lot like a restatement of Postel's Law > > https://en.wikipedia.org/wiki/Robustness_principle Yes indeed: "Be conservative in what you send, be liberal in what you accept" -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] On interface stability and forward compatibility
Am 05.02.2016 um 14:24 schrieb Markus Krötzsch: > I feel that this tries to evade the real issue by making formal rules about > what > kind of "breaking" you have to care about. It would be better to define > "breaking change" based on its consequences: if important services will stop > working, then you should make sure you announce it in time so this will not > happen. This requires you to talk to people on this list. I think the whole > proposal below is mainly trying to give you some justification to avoid > communication with your stakeholders. This is not the way to go. It's a way to prevent unpleasant surprises, and avoid unnecessary work. Talking about planned changes early on is certainly good, and we should get more organized at this. However, I would like to avoid having to treat *any* change like a breaking change. Breaking changes should be communicated a lot earlier, and a lot more carefully, then, say, additions and extensions. I tried to write down what clients *shouldn't* rely on. As Tom pointed out, these are really general design principles. They are not really specific to Wikibase, except for the "data type vs. value type" thing. Any software processing third party data should follow them. > how should a SPARQL Web service communicate problems that occurred when > importing the data? By informing whoever maintains the import, by writing to a log file or sending mail. That's the person who can fix the problem. That's who should be informed. > Our tools rely on being able to use all data, and the easiest way to ensure > that they will work is to announce technical changes to the JSON format well > in advance using this list. For changes that affect a particular subset of > widely used tools, it would also be possible to seek the feedback from the > main contributors of these tools at design/development time. Any we do that for breaking changes. I did not expect additional data types to cause any trouble. After all, you can still inject the data, since the value type is know. For a long time, out dumps didn't even mention the data type at all. > I am sure everybody here is trying their best to keep up with whatever > changes you implement, but it is not always possible for all of us to > sacrifice part of our weekend on short notice for making a new release before > next Wednesday. To avoid this problem in the future, I tried to spell out what guaranties we *don't* give, so a simple addition doesn't things don't break horribly. That doesn't mean we don't plan to communicate such changes at all, or better than we did now. We do. But this kind of thing is clearly distinct from actual "breaking changes" in my mind. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Last call for objections against DataModel changes.
A couple of weeks ago, I proposed to change our PHP data model bindings to allow extra info to be attached using the concept of "facets" simmilar to the "role object" and "extension object" pattern. Code experiments showcasing this idea can be found on github: * https://github.com/wmde/WikibaseDataModel/pull/576 * https://github.com/wmde/WikibaseDataModelSerialization/pull/174 This is the final call for objections against using this approach. The rationale behind it can be found on <https://phabricator.wikimedia.org/T118860> and related tickets. Implementation details can still change later, but after nearly 3 months, we finally need a decision on the conceptual level. If there are no substantial objections, this will become definite on Tuesday, December 8. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Using the Role Object Pattern to represent derived information in the data model
Hi all! For weeks and months now, we have been discussion how to best represent "extra" information in (or associated with) the wikibase data model. After some more discussion and a bit of research, I think I have found what we need: The Role Object Pattern aka Role Class Model, see <https://en.wikipedia.org/wiki/Role_Class_Model>. Please have a look at https://phabricator.wikimedia.org/T118860 and let me know if you have any objections. If not, let's use this sprint to discuss the details of the implementations, and do a task breakdown. PS: I came across quite a few famous names when during my research. Looks like we are not first in having this need... -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Wikidata] how to map other identifiers to Wikidata entity IDs
Am 09.11.2015 um 03:26 schrieb S Page: > I think these other identifiers are all "Wikidata property representing a > unique > identifier" and there are about 350 of them [2] But surprisingly, I couldn't > find an easy way to look up a Wikidata item using these other identifiers. We discussed some loose plans for implementing this in Currus when Stas was in Berlin a few weeks ago. On Special:Search, you would ask for property:P212:978-2-07-027437-6, and that would find the item with that ISBN. Stas: do we have a ticket for this somewhere? All I can find are the notes in the etherpad. > Also, is this a temporary thing? Will Wikidata eventually have items for every > book published, every musical recording, etc. and become a superset of all > those > unique identifiers? It's highly unlikely that wikidata will become a superset of any and all vocuabularies in existance. Better integration of external identifiers is high on our priority list right now. The first step will however be to property expose URIs for them, so we are no longer a dead end in the linked data web. But since we need to work on Cirrus integration anyway, I expect that we will have search-by-property soonish, too. I certrainly hope so. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] how is the datetime value with precision of one year stored
Hello Raul. While there is indeed some inconsistency with year-precision dates (some use 01-01 for month and day, some use 00-00), I cannot reproduce the issue you report. Looking at the JSON form of Q216, I see +2014-00-00, as expected. I connot find 2013 anywhere in the JSON. Am I missing something? Here is the entire statement in JSON: [ { "mainsnak": { "snaktype": "value", "property": "P1082", "datavalue": { "value": { "amount": "+539939", "unit": "1", "upperBound": "+539940", "lowerBound": "+539938" }, "type": "quantity" }, "datatype": "quantity" }, "type": "statement", "qualifiers": { "P585": [ { "snaktype": "value", "property": "P585", "hash": "a1c4aa51810ae8ef53dd5e243264e9d977c02081", "datavalue": { "value": { "time": "+2014-00-00T00:00:00Z", "timezone": 0, "before": 0, "after": 0, "precision": 9, "calendarmodel": "http:\/\/www.wikidata.org\/entity\/Q1985727" }, "type": "time" }, "datatype": "time" } ] }, "qualifiers-order": [ "P585" ], "id": "Q216$2a0bbe8d-4281-d178-93b0-9e6ff904ea91", "rank": "normal", "references": [ { "hash": "3c680f0b30bc470385ebab96c739ddd1c84be724", "snaks": { "P854": [ { "snaktype": "value", "property": "P854", "datavalue": { "value": "http:\/\/db1.stat.gov.lt\/statbank\/selectvarval\/saveselections.asp?MainTable=M3010211=1===9116===ST===", "type": "string" }, "datatype": "url" } ] }, "snaks-order": [ "P854" ] } ] } ] Am 31.08.2015 um 19:19 schrieb Raul Kern: > Hi, > how is the datetime value with precision of one year stored? > > For example for birt date in https://www.wikidata.org/wiki/Q299687 > fine grain value for "1700" is "1.01.1700" > > > But for population date field in https://www.wikidata.org/wiki/Q216 > the fine grain value for "2014" is "30.11.2013" > Which is kind of unexpected. > > > > -- > Raul > > ___ > Wikidata-tech mailing list > Wikidata-tech@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Dump requirements
There's an ongoing discussion in ops about improving the dump process, see https://phabricator.wikimedia.org/T88728 https://phabricator.wikimedia.org/T93396 https://phabricator.wikimedia.org/T17017 https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improve_dumps I would like to join in and add our requirements and thoughts to the list, and would like some input on that. So far I have: Make it easier to register a new type of dump via a config change. A dump should define: * a script(s) to run * output file(s) * the dump schedule * a short name * brief description (wikitext or HTML? translatable?) * required input files (maybe) Make clear timelines of consistent dumps. * drop the misleading one dir with one timestamp for all dumps appraoch * have one timeline per dump instead * for dumps that are guaranteed to be consistent (one generated from the other), generate a timeline of directories with symlinks to the actual files. Make dumps discoverable: * There should be a machine readable overview of which dumps exist in which versions for each project. * This overview should be a JSON document (may even be static) * Perhaps we also want a DCAT-AP description of our dumps Promote stable URLs: * The latest dump of any type should be available under a stable, predictable URL. * TBD: latest URL could point to a symlink, get rewritten to the actual file, or trigger an HTTP redirect. Thoughts? Comments? Additions? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint
Am 11.03.2015 um 10:43 schrieb Markus Krötzsch: I was referring to the investigations that have led to this spreadsheet: https://docs.google.com/a/wikimedia.org/spreadsheets/d/1MXikljoSUVP77w7JKf9EXN40OB-ZkMqT8Y5b2NYVKbU/edit#gid=0 That's the backend evaluation spreadsheet. I'm not arguing against BlazeGraph as a backend at all. I'm questioning the outcome of the public query language evaluation as shown in this sheet: https://docs.google.com/a/wikimedia.de/spreadsheets/d/16bbifhuoAiO7bRQ2-0mYU5FJ9ILczC-u9oCJsPdn9IU/edit#gid=0 Have a look at the weights, and st the comments, especially Gabriel's. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint
Am 11.03.2015 um 10:08 schrieb Markus Krötzsch: What I don't see is how the use of a WDQ API on top of SPARQL would make the overall setup any less vulnerable; it mainly introduces an additional component on top of SPARQL, and we can have a simpler SPARQL-based filter component there if we want, which is likely to be more effective in controlling usage. I disagree on both points: I believe it would be neither simpler, nor more effective. That's pretty much the core of it. However, I admit that this is currently a gut feeling, a concern I want to share and discuss. It should be investigated before making a decision. There is a huge cost to designing a query API from scratch, and I would really like to avoid this. Which is why I want to use one that already exists (WDQ), and back it by something that already exists (SPARQL). Supporting WDQ on top of SPARQL would retain WDQ in its current form and still support standards -- That's exactly what I propose. if we want to develop an official custom API, we will give up on both of these benefits, and at the same time push the ETA for Wikidata queries far into the future. I disagree. If, as I believe, sandboxing WDQ is simpler than sandboxing SPARQL, using WDQ would allow us to have a public query API sooner. But whether my believe is correct needs to be investigated, of course. All of this has been discussed and considered in the past. I don't see why one would be kicking off discussions now that question everything decided in meetings and telcos over the past weeks. There is absolutely no new information compared to what has led to the consensus that we all (including Daniel) had reached. The consensus as I remember it was we should be able to expose SPARQL safely, if we invest enough time to sandbox it. The issue of lock-in was mentioned but not really assessed. The relative cost for sandboxing WDQ vs SPARQL, and the impact on the ETA, was not discussed much. The ad-hoc evaluation spreadsheet shows WDQ as a second to SPARQL (before MQL and ASK), mainly because SPARQL is more powerful. The downside of that power doesn't factor into the evaluation, nor does the factor of lock-in. Shifting the relative weight in the spreadsheet from power to sustainability makes WDQ come out at the top. After the initial enthusiasm, this has made me increasingly uneasy over the last weeks. Hence my mail to this list. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint
Am 10.03.2015 um 18:22 schrieb Thomas Tanon: I support Magnus point of view. WDQ is a very good proof of concept but is, I think, to limited to be the primary language of the Wikidata query system. It can be extended. What I want is a limited domain specific language tailored to our primary use cases. Having it largely compatible with WDQ would be great. I did not mean to imply that we have to accept the current limitations of WDQ. I'm arguing that we should impose sensible limitations on queries, instead of committing to support everything that is possible with SPARQL. A possible solution is maybe to support two query languages as primary: 1 WDQ, at first, in order to have something working quickly 2 A safe subset of SPARQL (if it is possible) that would be implemented later using the experience got form the deployment of the first version of the query system. Or, if it is not possible, an improved version of WDQ that would break its current limitations. Absolutely. I'd like to avoid any commitment to keeping the SPARQL interface stable, though. That's why I'd limit it to labs-based usage. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint
Hi all! After the initial enthusiasm, I have grown increasingly wary of the prospect of exposing a SPARQL endpoint as Wikidata's canonical query interface. I decided to share my (personal and unfinished) thoughts about this on this list, as food for thought and a basis for discussion. Basically, I fear that exposing SPARQL will lock us in with respect to the backend technology we use. Once it's there, people will rely on it, and taking it away would be very harsh. That would make it practically impossible to move to, say, Neo4J in the future. This is even more true if if expose vendor specific extensions like RDR/SPARQL*. Also, exposing SPARQL as our primary query interface probably means abruptly discontinuing support for WDQ. It's pretty clear that the original WDQ service is not going to be maintained once the WMF offers infrastructure for wikidata queries. So, when SPARQL appears, WDQ would go away, and dozens of tools will need major modifications, or would just die. So, my proposal is to expose a WDQ-like service as our primary query interface. This follows the general principle having narrow interfaces to make it easy to swap out the implementation. But the power of SPARQL should not be lost: A (sandboxed) SPARQL endpoint could be exposed to Labs, just like we provide access to replicated SQL databases there: on Labs, you get raw access, with added performance and flexibility, but no guarantees about interface stability. In terms of development resources and timeline, exposing WDQ may actually get us a public query endpoint more quickly: sandboxing full SPARQL may likely turn out to be a lot harder than sandboxing the more limited set of queries WDQ allows. Finally, why WDQ and not something else, say, MQL? Because WDQ is specifically tailored to our domain and use case, and there already is an ecosystem of tools that use it. We'd want to refine it a bit I suppose, but by and large, it's pretty much exactly what we need, because it was built around the actual demand for querying wikidata. So far my current thoughts. Note that this is not a decision or recommendation by the Wikidata team, just my personal take. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint
Am 10.03.2015 um 21:09 schrieb Stas Malyshev: People would ask us for full SPARQL as soon as they'd know we're running SPARQL db. Sure. And I'D tell them you can use SPARQL on labs, but beware that it may change or go away. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Globe coordinates precision question (technical)
Am 12.01.2015 15:09, schrieb Markus Krötzsch: Great, this clarifies a lot for me. The other question was what to make of null values for precision. Do they mean no precision known or something else? IIRC, null is a bug here. Not sure how to handle that - we don't have the original string, and we can't really guess the precision based on the float values. Looking at GeoCoordinateFormatter, I see this: if ( $precision = 0 ) { $precision = 1 / 3600; } I.e. it assumes 1 arc sec if no percision is given. Not great, but not much else we can do at this point. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Things to get merged before the branch next week
Hey! Here's a few performance relevant changes I think should get merged before we branch next week: https://gerrit.wikimedia.org/r/#/c/170961/ Determine update actions based on usage aspects. --- the last bit missing for usage tracking https://gerrit.wikimedia.org/r/#/c/176650/ Use wb_terms table for label lookup. --- should improve memory consumption a lot, and possibly also speed. https://gerrit.wikimedia.org/r/#/c/167224/ Defer entity deserialization --- should reduce memory footprint and improve speed of trivial operations like checkign whether something is a redirect. Are there any other performance improvements that we should get in? I imagine that this will be the last time we branch until the third week of January. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Parsing Entity IDs
Am 17.10.2014 04:45, schrieb Jeroen De Dauw: Hey, I just noticed this commit [0], which gets rid of a pile of direct BasicEntityIdParser usages for performance reasons. Yay, thanks Katie! Of course this also means that no new code that introduces such occurrences should be allowed through review, even if it contains a fix this later TODO (for new code there is no excuse to do it wrong). There's no excuse to do it wrong, but there will always be things left to do later. TODOs are a good thing, it's just bad to put them in and forget about them (which I'm quite guilty of, I know). -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Wikibase changesAsJson
Yes, as far as I known, we have moved Change serialization to JSON a long time ago, and we can and should drop support for PHP serialization there. Double check with Katie though, she knows best what is currently deployed. Am 12.10.2014 23:59, schrieb Jeroen De Dauw: Hey, I was wondering if we still used PHP serialization in our change replication mechanism. (We need to be very careful making changes to the objects in WB DM if that is the case.) Looking at the code, I discovered we have a changesAsJson setting, presumably introduced to migrate away from the PHP serialization. Has such a migration happened? Can we get rid of the setting an the old PHP serialize code? Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] alternatives to memcached for caching entity objects across projects
Am 02.10.2014 17:55, schrieb Jeroen De Dauw: Hey, We use two CachingEntityRevisionLookup nested into each other: the outer-most uses a HashBagOStuff to implement in-process caching, the second level uses memcached. It is odd to have two different decorator instances for caching around th EntityRevisionLookup. I suggest to have only a single decorator for caching, which writes to a caching interface. Then this caching interface can have an implementation what uses multiple caches, and perhaps have a decorator on that level. Went that way first. Didn't work out nicely. I forget why exactly. I don't care much either way. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] alternatives to memcached for caching entity objects across projects
Hey Ori! Am 02.10.2014 06:45, schrieb Ori Livneh: I'm embarrassed to say that I don't know nearly enough about Wikidata to be able to make a recommendation. Where would you recommend I look if I wanted to understand the caching architecture? And I'm embarressed to say that we have very little high level documentation. There is no document on the overall caching architecture. The use case in question is accessing data Items (and other Entities, like Properties) from client wikis like Wikipedia. Entities are accessed through an EntityRevisionLookup service; CachingEntityRevisionLookup is an implementation of EntityRevisionLookup that takes an actual EntityRevisionLookup (e.g. a WikiPageEntityRevisionLookup) and a BagOStuff, and implements a caching layer. We use two CachingEntityRevisionLookup nested into each other: the outer-most uses a HashBagOStuff to implement in-process caching, the second level uses memcached. The objects that are cached there are instances of EntityRevision, which is a thin wrapper around an Entity (usually, an Item) plus a revision ID. Please let me know if you have further questions! -- daniel PS: what do you think, where should this info go? Wikibase/docs/caching.md or some such? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Multimedia] From the MW Core Backlog....
/Wikimedia_MediaWiki_Core_Team/Backlog#Structured_license_metadata I'm assuming everything that he describes fits nicely into what is planned for Structured Data. Assuming that's true, should I just copy/paste into a new card in Mingle, or a new page on mw.org http://mw.org/ or what? This seems to be about article text, or mainly about article text (articles imported from other wikis and so on). The plan for the structured data project is to create Wikidata properties for legalese, install Wikibase on Commons (and possibly other wikis which have local images), make that Wikibase use Wikidata properties (and sometimes Wikidata items as values), create a new entity type called mediainfo (which is like a Wikibase item, but associated with a file), and add legal information to the mediainfo entries. Part of that (the Wikidata properties) could be reused for articles and other non-file content - the source, license etc. properties are generic enough. However, if we want to use this structure to attribute files, we would either have to make mediainfo into some more generic thing that can be attached to any wiki page, or abuse the langlink/badge feature to serve a similar purpose. That is a major course correction; if we want to do something like that, that should be discussed (with the involvement of the Wikidata team) as soon as possible. Thanks for the analysis, Gergo! I was going to split Luis' proposal into a separate wiki page, but I see Nemo has linked to this page as the Canonical page on the topic: https://www.mediawiki.org/wiki/Files_and_licenses_concept Without a deep reading that I'm admittedly just not going to have time for, it's hard to tell how related the page that Nemo linked to is to the concepts that Luis is trying to capture. Could someone (Nemo? Luis?) merge Luis's requirements into the canonical page to Luis' satisfaction, so I can delete most of the information from our backlog? I'll keep the item on the MW Core backlog, since I don't know where else to put it, but it's probably going to be relatively low priority for that team. Multimedia team and Wikidata team, could you make sure you're considering the requirements that Luis brought up as you build your solution? Even if you decide to punt on some of the things that aren't strictly necessary for files, it's still good to make sure you don't paint us in a corner when if/when we do try to do something more sophisticated for articles. One thing I'll note, though, before we get too complacent in thinking that files are somehow simpler than articles, we should consider these relatively common scenarios: * Group photo with potentially different per-person personality rights * PDF of a slide deck with many images * PDF of a Wikipedia article :-) Rob ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Removing 3rd party dependencies from WikibaseQueryEngine
Am 09.09.2014 19:20, schrieb Daniel Kinzler: Hi Rob, thanks for clarifying! I guess I just oversimplified what was said in our discussion. I'll try to summarize what you now wrote: If there is a package for dbal/symfony/whatever in Ubuntu LTS, we have a good chance, but no guarantee, that TechOps is fine with deploying it. Quick update on that: If I understand correctly, the cluster is running Ubuntu 12.04, which doesn't have the packages in question, but an upgrade to 14.04 is in the pipeline. So, there are two things we need to know in order to make an informed decision: 1) can we use the Ubuntu LTS packages for symfony and dbal? 2) when is 14.04 going to be rolled out? Who can answer these questions? How do we poke TechOps? -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Removing 3rd party dependencies from WikibaseQueryEngine
Am 04.09.2014 20:03, schrieb Jeroen De Dauw: Hey, I'm also curious to if WMF is indeed not running any CLI tools on the cluster which happen to use Symfony Console. As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR... -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] How to record redirects in the database
This is about the question of how to best store entity redirects in the database. Below I try to describe the problem and the possible solutions. Any input welcome. A quick primer: Wikibase redirects are really Entity-ID aliases. They correspond to, but are not the same as, MediaWiki's page based redirects. If Q3 is a redirect to (an alias for) Q5, the page Item:Q3 is also a redirect to Item:Q5. The JSON blob on Item:Q3 would store a redirect entry *instead* of an entity. Entities never *are* redirects. Wikibase currently stores a mapping of entity ids to page ids in the wb_entity_per_page table (epp table for short). MediaWiki core stores redirects as a kind of link table, with rd_from being a page_id, and rd_to+rd_namespace being name+namespace of the redirect target. Requirements: * When looking up an EntityId, we need to be able to load the corresponding JSON blob, and for that we need to find the corresponding wiki page (either by id, or by name+namespace). We need to be able to do this cross-wiki, so we may not have the repo's configuration (wrt namespaces, etc) available when constructing the query. * We need an efficient way to list all entity IDs on a wiki (without redirects). In particular, the mechanism for listing entities must support efficient paging. * We need an efficient way to resolve redirects in bulk, or at least, to discern redirects from unknown/deleted entity ids. Options: 1) No redirects in the epp table (current). This means we need to use the name+namespace when loading the entity-or-redirect from a page, since we don't know the page ID if it's a redirects. We also can't use core's redirect table, because for that, we also need to know the page id first. In order to use name+namespace for looking up page IDs for entities, client wikis would need to know the namespace IDs used on the repo, in order to generate queries against the repo's database. 2) Put redirects into the epp table as well, without any special marking. This makes lookups easy, but gives us no efficient way to list all entities without redirects. We'd need to check and skip redirects while iterating. This would add complexity to several maintenance and upgrade scripts. 3) Put redirects into the epp table, with a marker (or target id) in a new column. This would allow for both, simple lookup and efficient listing, but it means adding a column (and an index) to an already large table in production. It also means having the overhead of a column that's mostly null. 4) Put redirects into epp *and* a separate table. Provides simple lookup, but means a potentially slow join when listing entities. This join would happen multiple times each time we need to list all entities, because of paged access - compare how JsonDumpGenerator works. 5) Put redirects into a special table but not into epp. This means fast/simple listing of entities, but requires a not-so-nice try logic when looking up entities: if no entry is found in the epp table, we then need to go on and try the entity-redirect table, to see whether the id is redirected or unknown/deleted. Assessment: 1) is nasty in terms of cross-wiki configuration. It's the simplest solution on the code and database levels, but seems brittle. 2) adds complexity to everything that lists entities. Big performance impact in cases where entity blobs would otherwise not have been loaded, but are loaded now to check whether they contain redirects. 3) is somewhat wasteful on the database level, and needs a schema change deployment on a large table. Don't know how bad that would be, though. 4) may cause performance issues because it adds complexity to big queries on large tables. Needs trivial schema change deployment (new table). 5) adds complexity to the code that reads entity blobs from the database, impacts performance for the redirect and missing entity cases by adding a database query. Could be acceptable if these cases are rare. Needs trivial schema change deployment (new table). -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] reviews needed for pubsubhubbub extension
Am 09.07.2014 19:39, schrieb Dimitris Kontokostas: On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler daniel.kinz...@wikimedia.de mailto:daniel.kinz...@wikimedia.de wrote: Am 09.07.2014 08:14, schrieb Dimitris Kontokostas: Maybe I am biased with DBpedia but by doing some experiments on English Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes. OAI aggregates multiple revisions of a page to a single edit so when we ask: get me the items that changed the last 5 minutes we skip the processing of many minor edits It looks like we lose this option with PubSubHubbub right? I'm not quite positive on this point, but I think with PuSH, this is done by the hub. If the hub gets 20 notifications for the same resource in one minute, it will only grab and distribute the latest version, not all 20. But perhaps someone from the PuSH development team could confirm this. It 'd be great if the dev team can confirm this. Besides push notifications, is polling an option in PuSH? I briefed through the spec but couldn't find this. Yes. You can just poll the interface that the hub uses to fetch new data. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] reviews needed for pubsubhubbub extension
Am 09.07.2014 08:14, schrieb Dimitris Kontokostas: Hi, Is it easy to brief the added value (or supported use cases) by switching to PubSubHubbub? * It's easier to handle than OAI, because it uses the standard dump format. * It's also push-based, avoiding constant polling on small wikis. * The OAI extension has been deprecated for a long time now. The edit stream in Wikidata is so huge that I can hardly think of anyone wanting to be in *real-time* sync with Wikidata With 20 p/s their infrastructure should be pretty scalable to not break. The push aspect is probably most useful for small wikis. It's true, for large wikis, you could just poll, since you would hardly ever poll in vain. IT would be very nice if the sync could be filtered by namespace, category, etc. But PubSubHubbub (i'll use PuSH from now on) doesn't really support this, sadly. Maybe I am biased with DBpedia but by doing some experiments on English Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes. OAI aggregates multiple revisions of a page to a single edit so when we ask: get me the items that changed the last 5 minutes we skip the processing of many minor edits It looks like we lose this option with PubSubHubbub right? I'm not quite positive on this point, but I think with PuSH, this is done by the hub. If the hub gets 20 notifications for the same resource in one minute, it will only grab and distribute the latest version, not all 20. But perhaps someone from the PuSH development team could confirm this. As we already asked before, does PubSubHubbub supports mirroring a wikidata clone? The OAI-PMH extension has this option Yes, there is a client extension for PuSH, allowing for seemless replication of one wiki into another, including creation and deletion (I don't know about moves/renames). -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] PubSubHubbub what is it all about ? - Re: Wikidata-tech Digest, Vol 15, Issue 3
Hi Pram! Am 09.07.2014 17:13, schrieb Param: Hi, I am new member to wikidata and would like to know all about “*PubSubHubbub*” the new project. PubSubHubbub (PuSH for short) is a push-based notification mechanism. See https://en.wikipedia.org/wiki/PubSubHubbub. We plan to implement it for wikidata.org. The code is at https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FPubSubHubbub -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Resolving Redirects
Hi all. I'm writing to get input on a conceptual issue regarding the resolution of redirects. I'm currently in the process of implementing redirects for Wikibase Items (bugzilla 66067). My present task is to add support for redirect resolution to the EntityLookup service interface (and possibly the related EntityRevisionLookup service interface; bugzilla 66075). Currently, the two interfaces in question look like this (with some irrelevant stuff omitted): interface EntityLookup { public function getEntity( EntityId $entityId, $revision = 0 ); public function hasEntity( EntityId $entityId ); } interface EntityRevisionLookup extends EntityLookup { public function getEntityRevision( EntityId $entityId, $revisionId = 0 ); public function getLatestRevisionId( EntityId $entityId ); } Note that getEntityRevision returns an EntityRevision object (an Entity with some revision meta data), while getEntity just returns an Entity object. Also note that the $revision parameter in EntityLookup::getEntity is deprecated and being removed (see patch Iafdcb5b38), while $revision in EntityRevisionLookup::getEntityRevision is supposed to stay. Presently, the attempt to look up an Entity via an ID that has been turned into a redirect will result in an exception being thrown. To implement redirect resolution, original intention was to leave the EntityRevisionLookup as is, and change EntityLookup like this: interface EntityLookup { public function getEntity( EntityId $entityId, $resolveRedirects = 1 ); public function hasEntity( EntityId $entityId, $resolveRedirects = 1 ); } ...with the $resolveRedirects parameter indicating how many levels of redirects should be resolved before giving up. This gives use a convenient way to get the current revision of an entity, following redirects; And it keeps the interface for requesting a specific, or the latest, version of an Entity, with meta info attached. However, it means we have to implement the logic for redirect resolution in every implementation class, generally using the same code over and over (there are currently three implementations of EntityRevisionLookup: the actual lookup, a caching wrapper, and an in-memory fake). Also, it does not give us a straight-forward way to get the meta-data of the current revision while following redirects. For that, we'd have to modify EntityRevisionLookup::getEntityReevision: public function getEntityRevision( EntityId $entityId, $revisionId = 0, $resolveRedirects = 0 ); This is ugly, and annoying since we'll want to *either* resolve redirects *or* specify a revision. We could use a special value for $revisionId to indicate that we not only want the current revision (indicated by 0), but also want to have redirects resolved (indicated by follow or -1 or whatever): public function getEntityRevision( EntityId $entityId, $revisionIdOrRedirects = 0, ); That's concise, but somewhat magical. Or we could add another method: public function getEntityRevisionAfterFollowingAnyRedirects( EntityId $entityId, $resolveRedirects = 1, ); That's not quite obvious, and the awkward name indicates that this isn't really what we want either. Perhaps we can get around all this mess by making redirect resolution something the interface doesn't know about? An implementation detail? The logic for resolving redirects could be implemented in a Proxy/Wrapper that would implement EntityRevisionLookup (and thus also EntityLookup). The logic would have to be implemented only once, in one implementation class, that could be wrapped around any other implementation. From the implementation's point of view, this is a lot more elegant, and removes all the issues of how to fit the flag for redirect resolution into the method signatures. However, this means that the caller does not have control over whether redirects are resolved or not. It would then be the responsibility of bootstrap code to provide an instances that does, or doesn't, do redirect resolution to the appropriate places. That's impractical, since the decisions whether redirects should be resolved may be dynamic (e.g. depend on a parameter in an web API call), or the caller may wish to handle redirects explicitly, by first looking up without redirect, and then with redirect resolution, after some special treatment. So, it seems that the ugly variant with an extra parameter in getEntityRevision() is the most practical, even though it's not the most elegant from an OO design perspective. What's your take on this? Got any better ideas? -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Constructing Entities from their serialization
Am 06.06.2014 06:07, schrieb Jeroen De Dauw: $item = new Item( array( ) ); Some tests I touched recently use this, and I didn't change it, just moved things around. I agree that knowing about a specific serialization format in tests is bad. On the other hand, it's nice to be able to construct an entity in a single statement, instead of building it iteratively. Also, some test cases take the array data as input, and only construct the entity when running the test. This is convenient in cases when the data provider does not know the concrete type of entity under test. I guess that's why this was introduced. I'm moving tests away from that, though. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Using the canonical JSON model in dumps.
Hi all! Context: We plan to change the XML dumps (and Special:Export) to use the same JSON serialization that is used by the API, instead of the terse but brittle internal format. This is about the mechanism we plan to use for the conversion. SO, I just went and checked my assertion that WikiExporter will use the Content object's serialize method to generate output. I WAS WRONG. It doesn't. I'll use the text from the database, as-is (for reference, find the call to Revision::getRevisionText in Export.php). In order to force a conversion to the new format, we'll need to patch core to a) inject a hook here to override the default behavior or b) make it always use a Content object (unless, perhaps, told explicitly not to). This is not hard to code, but doing it Right (tm) may need some discussion, and getting it merged may need some time. Sorry for not checking this earlier. Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Managing dependencies when extending Wikibase
I have discussed the dependency issue with Jeroen, so here is what I took away from the conversation: * factoring the service interfaces out of the Wikibase extension would be nice to have, but is not necessary to resolve the present issue. * An extension/plugin (in this case, the property suggester) will indeed typically have a dependency on the application/framework it was written for (in this case, wikibase). * When installing a plugin stand-alone, the application (here: wikibase) would be installed somewhere in the vendor directory. This is fine for runnign unit tests against the plugin (the property suggester), but of course doesn't make much sense when we want to use wikibase and the suggester as mediawiki extensions (especially not if MediaWiki itself was pulled in as a dependency). * In order to install extensions for an application in a way that the extensions are installed under the application, even though they depend on the app, and not vice versa, a local build can be used: * We would create a composer manifest that defines the app (wikibase) and the extensions (the suggester, etc) as dependencies, and then use composer to install that. This will cause wikibase and the suggester to be installed together, side by side, rather than putting wikibase under the suggester. * In fact, we already do something like this with the Wikidata extension, which is just a build of Wikibase with all the dependencies and additions we want. HTH -- daniel Am 06.03.2014 16:03, schrieb Daniel Kinzler: The folks of the Wikidata.lib project at the Hasso Plattner Institut have developed an extension to Wikibase that allows us to suggest properties to add to items, based on the properties already present (a very cool project, btw). This is, conceptually, and extension to the Wikibase extension. This raises problems for managing dependencies: * conceptually, the extension (property suggester) depends *on* wikibase. * practically, we want to install the property suggester as an optional dependency (feature/plugin/extension) *of* wikibase. So, how do we best express this? How can composer handle this? I think the most obvious/important thing to do is to have a separate module for the interface wikibase exposes to plugins/extensions. This would include the interfaces of the service objects, and some factory/registry classes for accessing these. What's the best practice for exposing such a set of interfaces? How is this best expressed in terms of composer manifests? What are the next steps to resolve the circular/inverse dependencies we currently have? -- daniel PS: Below is an email in which Moritz Finke listed the dependencies the property suggester currently has: Original-Nachricht Betreff: PropertySuggester Dependencies Datum: Thu, 6 Mar 2014 11:07:56 + Von: Finke, Moritz moritz.fi...@student.hpi.uni-potsdam.de An: Daniel Kinzler daniel.kinz...@wikimedia.de Hi, unten sind die Abhängigkeiten des PropertySuggesters nach Klassen sortiert... Grüße Moritz Abhängigkeiten PropertySuggester: GetSuggestions: use Wikibase\DataModel\Entity\ItemId; use Wikibase\DataModel\Entity\Property; use Wikibase\DataModel\Entity\PropertyId; use Wikibase\EntityLookup; use Wikibase\Repo\WikibaseRepo; use Wikibase\StoreFactory; use Wikibase\Utils; use ApiBase; use ApiMain; use DerivativeRequest; WikibaseRepo::getDefaultInstance()-getEntityContentFactory(); StoreFactory::getStore( 'sqlstore' )-getEntityLookup(); StoreFactory::getStore()-getTermIndex()-getTermsOfEntities( $ids, 'property', $language ); Utils::getLanguageCodes() 'type' = Property::ENTITY_TYPE ) SuggesterEngine: use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Entity\PropertyId; Suggestion: use Wikibase\DataModel\Entity\PropertyId; SimplePHPSuggester: use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Entity\PropertyId; use DatabaseBase; use InvalidArgumentException; GetSuggestionsTest: use Wikibase\Test\Api\WikibaseApiTestCase; SimplePHPSuggesterTest: use Wikibase\DataModel\Entity\PropertyId; use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Claim\Statement; use Wikibase\DataModel\Snak\PropertySomeValueSnak; use DatabaseBase; use MediaWikiTestCase; JavaScript: wikibase.entityselector wbEntityId -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Managing dependencies when extending Wikibase
The folks of the Wikidata.lib project at the Hasso Plattner Institut have developed an extension to Wikibase that allows us to suggest properties to add to items, based on the properties already present (a very cool project, btw). This is, conceptually, and extension to the Wikibase extension. This raises problems for managing dependencies: * conceptually, the extension (property suggester) depends *on* wikibase. * practically, we want to install the property suggester as an optional dependency (feature/plugin/extension) *of* wikibase. So, how do we best express this? How can composer handle this? I think the most obvious/important thing to do is to have a separate module for the interface wikibase exposes to plugins/extensions. This would include the interfaces of the service objects, and some factory/registry classes for accessing these. What's the best practice for exposing such a set of interfaces? How is this best expressed in terms of composer manifests? What are the next steps to resolve the circular/inverse dependencies we currently have? -- daniel PS: Below is an email in which Moritz Finke listed the dependencies the property suggester currently has: Original-Nachricht Betreff: PropertySuggester Dependencies Datum: Thu, 6 Mar 2014 11:07:56 + Von: Finke, Moritz moritz.fi...@student.hpi.uni-potsdam.de An: Daniel Kinzler daniel.kinz...@wikimedia.de Hi, unten sind die Abhängigkeiten des PropertySuggesters nach Klassen sortiert... Grüße Moritz Abhängigkeiten PropertySuggester: GetSuggestions: use Wikibase\DataModel\Entity\ItemId; use Wikibase\DataModel\Entity\Property; use Wikibase\DataModel\Entity\PropertyId; use Wikibase\EntityLookup; use Wikibase\Repo\WikibaseRepo; use Wikibase\StoreFactory; use Wikibase\Utils; use ApiBase; use ApiMain; use DerivativeRequest; WikibaseRepo::getDefaultInstance()-getEntityContentFactory(); StoreFactory::getStore( 'sqlstore' )-getEntityLookup(); StoreFactory::getStore()-getTermIndex()-getTermsOfEntities( $ids, 'property', $language ); Utils::getLanguageCodes() 'type' = Property::ENTITY_TYPE ) SuggesterEngine: use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Entity\PropertyId; Suggestion: use Wikibase\DataModel\Entity\PropertyId; SimplePHPSuggester: use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Entity\PropertyId; use DatabaseBase; use InvalidArgumentException; GetSuggestionsTest: use Wikibase\Test\Api\WikibaseApiTestCase; SimplePHPSuggesterTest: use Wikibase\DataModel\Entity\PropertyId; use Wikibase\DataModel\Entity\Item; use Wikibase\DataModel\Claim\Statement; use Wikibase\DataModel\Snak\PropertySomeValueSnak; use DatabaseBase; use MediaWikiTestCase; JavaScript: wikibase.entityselector wbEntityId attachment: winmail.dat___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] wbsetclaim
Am 26.02.2014 18:41, schrieb Jeroen De Dauw: Uh, didn't we fix this a long time ago? Client-Supplied GUIDs are evil :( This has come up at some point, and as far as I recall, we dropped the requirement to provide the GUID. So I suspect one can provide a claim without a GUID, else something went wrong somewhere. I have filed https://bugzilla.wikimedia.org/show_bug.cgi?id=61950 now. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Badges
Am 05.02.2014 22:40, schrieb Bene*: Am 05.02.2014 18:00, schrieb Bene*: Interesting, so in your opinion the actual display of items should happen via the common.css? I think this can work though I don't know if we should leave this implementation detail to the local wikis. At least, it would prevent another config to be added to the client which is very recommened from my side. Also the wiki could rank the badges easier. (New css properties override old ones.) Thus I support your idea leaving this to the client wikis. I think that it's up to the local wiki to decide which badges to show, and how. Being able to manage this on-wiki seems like a good idea. Another question, however, is which tooltip title should be added to the badges sitelink. We could use the description of the wikidata item but I am not sure if we can access it easily from client. However, it would provide an easy way to translate the tooltip without some hacky mediawiki messages. Best regards, Bene* In addition to the previous message, we still have to decide on one badge if we want to add a tooltip title. However, I don't think it makes sense to add a config variable only for the tooltip. Do you have any idea how to fix this? A bit of JS code could set the tooltip based on the css classes. Access to the label associated with the badge would be possible by querying wikidata, but it would be nicer if we could somehow cache that info along with the page content. Otherwise, it would have to be fetched for every page view on wikipedia... not good. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Report from the Architecture Summit
(reposting, accidentally posted this to the internal list at first) Hey. Here's a brief summary of what I talked to folks in SF about, what the result was, or who we should contact to move forward. * At the architecture summit, there seemed to be wide agreement that we need to improve modularity in core. The TitleValue proposal was viewed as going to far to the dark side of javafication, but it was generally seen to be moving in the right direction. I will update the change soon to address some comments. * Furthermore, we (the core developers) should see out service interfaces that can and should be factored out of existing classes, starting with pathological cases like EditPage, Title, or User. Several people agreed to look into that (and at the same time watch out to avoid javafication), Nik Everett vonunteered to lead the discussion. * Gabriel Wicke has interesting plans for factoring out storage services (both low level blob storage as well as higher level revision storage) into separate HTTP/REST services. * Jurik is working on a library/extension for JSON based configuration storage for extensions. Needs review/feedback, I'm looking into that. * I asked Aaron to provide a JobSpecification interface, so jobs can be scheduled without having to instantiate the class that will be used to execute the job. This makes it easier to post jobs from one wiki to another. Aaron has already implemented this now, yay! * Yurik wants us to rework the Wikibase API to be compatible with the core APIs query infrastructure. This would allow use to use item lists generated by one module as the input for another module. See https://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API * After talking to Chad, I'm now pretty sure we should go for ElasticSearch for implementing queries right away. It just seems a lot simpler than using MySQL for the baseline implementation. This however makes ElasticSearch a dependency of WikibaseQuery, making it harder for third parties to set up queries (though setting up Elastic seems pretty simple). * Brion would like to be in the loop on the PubsubHubbub project. For the operations side, and the question whether WMF would want to run their own hub, he pointed me to Ori and Mark Bergsma. * I didn't make progress wrt the JSON dumps. Need to get hold of Ariel, he wasn't around. We need to find out what makes the dumps so slow. Aaron Schulz agreed to help with that. One problematic aspect of the current implementation is that it tries to retrieve all entity IDs with a single DB query. We might need to chunk that. * For the future use of composer, we should be in touch with Markus Glaser and Hexmode (Mark Hershberger), as well as with Hashar. * Hashar is quite interested in switching to composer and perhaps also Travis. He was happy to hear that travis is Berlin based and sympathetic. The WMF might even be ready to invest a bit into making Travis work with our workflow. Hashar may come and visit us, poke him about it! * For access to the new log stash service, we should talk to Ken Snider * For shell access we should talk to Quim. * I discussed allowing queries on page_prove by property value with Tim as well as Roan. Tim suggested to add a pp_sortkey column to page_props (a float, but nullable), and index by pp_propname+pp_sortkey. That should cover most use cases nicely, without big schema changes. So, lots to follow up on! Cheers Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Adding configuration to WikibaseLib
Am 30.01.2014 17:15, schrieb Jeroen De Dauw: Hey, It has long since been clear it is harmful to add configuration into WikibaseLib. It is a library, not an application, and its users might well want to use it with different config. This means that no additional entries should be added to WikibaseLib.default.php, and that commits that do should not be merged. I see your point (library code should not access Settings objects, but use explicit parameters), but this will make it difficult to manage settings shared by repo and client in a single place. Having these in one place makes sure they are consistent, which is especially important when running both repo and client on the same wiki. Do you have a suggestion how to solve this? We have had different saettigns for the same thing in repo and client before, I would like to avoid this in the future. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Wikibase data exchange format
Am 16.01.2014 17:18, schrieb Brueckner, Sebastian: Hey everyone, As we have just introduced ourselves on wikidata-l, we are currently working on a PubSubHubbub [1] extension [2] to MediaWiki. Currently, the extension only works on MediaWiki articles, not on Wikibase objects. For those articles we are using the wiki markup as exchange format (using URLs with action=raw), but currently there is no equivalent in Wikibase. Jeroen already explained about the canonical JSON format. In the context I would like to add some information about the URI scheme we use for our linked data interface, which should also be used for PuSH, I think: The canonical URI of the *description* of a Wikidata item has the form http://www.wikidata.org/wiki/Special:EntityData/Q64. This URI is format-agnostic, content negotiation is used to redirect to the appropriate concrete URL (in a web browser, the redirect will typically take you to the item's nromal page). A format can also be specified directly by giving a file extension, e.g. http://www.wikidata.org/wiki/Special:EntityData/Q64.json In contrast, the canonical URI of the *concept* described by a Wikidata item follows the form https://www.wikidata.org/entity/Q64. I suggest to use format-agnostic canonical *description* URI for PuSH notifications. The URI scheme is described at https://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme, but please note that that document was a working draft, and some aspects may be outdated. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Wikibase data exchange format
Am 16.01.2014 17:18, schrieb Brueckner, Sebastian: For those articles we are using the wiki markup as exchange format (using URLs with action=raw), but currently there is no equivalent in Wikibase. I'm actually not sure action=raw is a good choice for wikitext - it's an old, deprecated interface, and has several shortcomings. I'd suggest to use a canonical document URI - such as the plain article URL. The URI just identifies what was changed, the client may have (and even need) additional knowledge to retrieve the updated content in the desired form. At least that is my understanding on how PuSH works - if this is not the case, I see no good way to support multiple content formates. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Wikibase data exchange format
Am 17.01.2014 08:39, schrieb Daniel Kinzler: I suggest to use format-agnostic canonical *description* URI for PuSH notifications. I just realized that this will not work well, since the hub will retrieve that data, and all clients would then receive the data in the format the hub (not the clients/subscribers) prefers. To avoid this, a format-specific description URL can be used, e.g. http://www.wikidata.org/wiki/Special:EntityData/Q64.json -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] locally run lua scripts
Am 05.01.2014 15:02, schrieb Liangent: On Sun, Jan 5, 2014 at 9:34 PM, Voß, Jakob jakob.v...@gbv.de mailto:jakob.v...@gbv.de wrote: If what you're executing is not something huge, doesn't require (m)any external dependencies, and doesn't have user interaction, you can try to (ab)use Scribunto's console AJAX interface: Thanks, I used your example to set up a git repository with notes. I planned to clone the full module-namespace with git, Huh this makes me think of a git-mediawiki tool (compare with git-svn). There's already an (inactive) wikipediafs http://wikipediafs.sourceforge.net/ There's also the (inactive) levitation project: https://github.com/scy/levitation - a project to convert Wikipedia database dumps into Git repositories. It doesn't scale for Wikipedia, but should work fine for smaller dumps. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Improving leaky test cases
I'm about to sign off for the holidays, until January 6th, so here's a quick heads up: For investigating sporadic failures of test cases, I have created a branch of wikibase on github, which has travis set up for testing: https://github.com/wmde/Wikibase/tree/fixtravis https://travis-ci.org/wmde/Wikibase This branch contains quite a few fixes/improvements to test cases. It would be good to have them on gerrit soon. The following tests were identified of (probably) using hard coded entity IDs in an unhealthy way, but I didn't get around to fixing them yet: repo/tests/phpunit/includes/api/SetClaimTest.php repo/tests/phpunit/includes/api/SetQualifierTest.php repo/tests/phpunit/includes/api/SetReferenceTest.php repo/tests/phpunit/includes/api/SetSiteLinkTest.php They should probably be fixed along the same lines as MergeItemsTest, using the new EntityTestHelper::injectIds method to inject real ids for placeholders in the data the providers return. Cheers! Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Wikitech-l] Help needed with ParserCache::getKey() and ParserCache::getOptionsKey()
Am 10.12.2013 22:38, schrieb Brad Jorsch (Anomie): Looking at the code, ParserCache::getOptionsKey() is used to get the memc key which has a list of parser option names actually used when parsing the page. So for example, if a page uses only math and thumbsize while being parsed, the value would be array( 'math', 'thumbsize' ). Am 11.12.2013 02:35, schrieb Tim Starling: No, the set of options which fragment the cache is the same for all users. So if the user language is included in that set of options, then users with different languages will get different parser cache objects. Ah, right, thanks! Got myself confused there. The thing is: we are changing what's in the list of relevant options. Before the deployment, there was nothing in it, while with the new code, the user language should be there. I suppose that means we need to purge these pointers. Would bumping wgCacheEpoch be sufficient for that? Note that we don't care much about puring the actual parser cache entries, we want to purge the pointer entries in the cache. We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages. That's probably because you incorrectly used $wgLang or RequestContext::getLanguage(). The user language for the parser is the one you get from ParserOptions::getUserLangObj(). Oh, thanks for that hint! Seems our code is inconsistent about this, using the language from the parser options in some places, the one from the context in others. Need to fix that! It's not necessary to call ParserOutput::recordOption(). ParserOptions::getUserLangObj() will call it for you (via onAccessCallback). Oh great, magic hidden information flow :) Thanks for the info, I'll get hacking on it! -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Help needed with ParserCache::getKey() and ParserCache::getOptionsKey()
Hi. I (rather urgently) need some input from someone who understands how parser caching works. (Rob: please forward as appropriate). tl;dr: what is the intention behind the current implementation of ParserCache::getOptionsKey()? It's based on the page ID only, not taking into account any options. This seems to imply that all users share the same parser cache key, ignoring all options that may impact cached content. Is that correct/intended? If so, why all the trouble with ParserOutput::recordOption, etc? Background: We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages. I tried to split the parser cache by user language using ParserOutput:.recordOption to include userlang in the cache key. When tested locally, and also on our test system, that seemed to work fine (which seems strange now, looking at the code of getOptionsKey()). On the life site however, it failed. Judging by its name, getOptionsKey should generate a key that includes all options relevant to caching page content in the parser cache. But it seems it forces the same parser cache entry for all users. Is this intended? Possible fix: ParserCache::getOptionsKey could delegate to ContentHandler::getOptionsKey, which could then be used to override the default behavior. Would that be a sensible approach? And if so, would it be feasible to push out such a change before the holidays? Thanks, Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] future of the entity suggester
Hello Nilesh! Good to hear from you. I was off for a couple of days, and asked Lydia to make introductions. Thanks Lydia! A quick heads up: The architecture we have discussed with the team at the HPI is a bit different from what we designed for the GSoC project. The idea is to have a MediaWiki extension that relies directly on the data in a MySQL table, and generates suggestions from that. It does not care where the data comes from, so the database table(s) server as an interface between the front (mediawiki) part and the back (data analysis) part. This has two advantages: 1) front and back are decoupled and only have to agree on the structure and interpretation of the data in the database (this is the current TODO). 2) No new services need to be deployed in the public-facing subnet. I think your expertise with data ingestion could help the folks at the HPI quite a bit. Also, the modular architecture allows for data analysis components to be swapped out easily, and we would like to try and compare different approaches for data analysis. One based on Hadoop and/or Myrrix could well be an option - though I'm not sure whether Myrrix would be very useful, since the actual generation of suggestions from the pre-processed data would already be covered. This is just an idea, I think you can best figure things out among yourself. Cheers, Daniel Am 25.11.2013 17:01, schrieb Lydia Pintscher: Hey everyone, I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step. Cheers Lydia ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] RFC: TitleValue
Hi all! As discussed at the MediaWiki Architecture session at Wikimania, I have created an RFC for the TitleValue class, which could be used to replace the heavy-weight Title class in many places. The idea is to show case the advantages (and difficulties) of using true value objects as opposed to active records. The idea being that hair should not know how to cut itself. You can find the proposal here: https://www.mediawiki.org/wiki/Requests_for_comment/TitleValue Any feedback would be greatly appreciated. -- daniel PS: I have included the some parts of the proposal below, to give a quick impression. -- == Motivation == The old Title class is huge and has many dependencies. It relies on global state for things like namespace resolution and permission checks. It requires a database connection for caching. This makes it hard to use Title objects in a different context, such as unit tests. Which in turn makes it quite difficult to write any clean unit tests (not using any global state) for MediaWiki since Title objects are required as parameters by many classes. In a more fundamental sense, the fact that Title has so many dependencies, and everything that uses a Title object inherits all of these dependencies, means that the MediaWiki codebase as a whole has highly tangled dependencies, and it is very hard to use individual classes separately. Instead of trying to refactor and redefine the Title class, this proposal suggest to introduce an alternative class that can be used instead of Title object to represent the title of a wiki page. The implementation of the old Title class should be changed to rely on the new code where possible, but its interface and behavior should not change. == Architecture == The proposed architecture consists of three parts, initially: # The TitleValue class itself. As a value object, this has no knowledge about namespaces, permissions, etc. It does not support normalization either, since that would require knowledge about the local configuration. # A TitleParser service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a string into a TitleValue should require a TitleParser service as a constructor argument (dependency injection). Should that not be possible, a TitleParser can be obtained from a global registry. # A TitleFormatter service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a TitleValue into a string should require a TitleFormatter service as a constructor argument (dependency injection). Should that not be possible, a TitleFormatter can be obtained from a global registry. ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Wikidata-l] [Pywikipedia-l] wbsearchentities()
Hi all! We have to impose a fixed limit on search result, since search results can not be ordered by a unique ID, so paging is expensive. The default for this limit is 50, but it SHOULD be 500 for bots. But the higher limit for bots is currently not applied by the wbsearchentities module - that's a bug, see https://bugzilla.wikimedia.org/show_bug.cgi?id=54096. We should be able to fix this soon. Please poke us again if nothing happens for a couple of weeks. -- daniel Am 12.09.2013 12:12, schrieb Merlijn van Deen: On 11 September 2013 20:31, Chinmay Naik chin.nai...@gmail.com wrote: Can i retreive more than 100 items using this? I notice the 'search-continue' returned by the search result disappears after 50 items. for ex https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentitiesformat=jsonsearch=abclanguage=entype=itemlimit=10continue=50 The api docs at https://www.wikidata.org/w/api.php explicitly state the highest value for 'continue' is 50: limit - Maximal number of results The value must be between 0 and 50 Default: 7 continue- Offset where to continue a search The value must be between 0 and 50 Default: 0 which indeed suggests there is a hard limit of 100 entries. Maybe someone in the Wikidata dev team can explain the reason behind this? Merlijn ___ Wikidata-l mailing list wikidat...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] IRI, URL, sameAs and the identifier mess
Am 03.09.2013 21:43, schrieb David Cuenca: A couple of months ago there was a conversation about what to do with the identifiers that should be owl:sameAs [1] It's unclear to me where owl:sameAs would be used... it should definitely NOT be used to point to descriptions of the same thing in other repositories. See https://www.wikidata.org/w/index.php?title=Wikidata%3AProject_chat%2FArchive%2F2013%2F07diff=70181630oldid=66375829. Then there is another discussion about using a formatter URL property to use any catalogue/db as an id instead of creating a property [2] That seems fine to me. Now there is another property proposal to implement sameAs as a property taking a url. [3] Ick! That's just utterly wrong! I'll leave a message. And this is all related to the recent thread in this mailing list about IRI-value or string-value for URLs. That is a misunderstanding. That was purely about the internal representation of these values in code. It has nothing to to with whether or not the data type itself will support URI values or just strict URIs or URLs. The URL data type should support any URL you can use in wikitext (there are some known issues with non-ascii domains right now, but that's a bug and being worked on). So, in the end, what is the preferred approach? I can't tell you what the Wikidata community currently views as the best way. Personally, i would use separate properties for different identifiers, and document how each such identifier maps to a URI/URL. The url data type can be used for URLs, URIs, IRIs, etc. It's just a question of convention and of how you interpret the respective properties. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] [Wikidata-l] [Pywikipedia-l] wbsearchentities()
Am 13.09.2013 18:24, schrieb Benjamin Good: Daniel, Even 500 seems like a very low limit for this system unless I'm misunderstanding something. Unless there is another way to execute queries that return more rows than that, this would negate the possibility of a huge number of applications - all of ours in particular. If we want to say, request something like all human genes (about 20,000 items), how would we do that? You are looking for actual *query* support, not just a search by name. This is on the road map, and I hope we will be able to deploy it by the end of the year. But it's not possible yet. Supporting queries like all people born in hamburg or all cities in europe is an obvious goal for wikidata. And we are working on it, but it's not trivial to make this scale to the number of entries, queries and different properties we are dealing with. Within Wikipedia, we do this via the mediawiki API based on contains-template or category queries without any issue. Certainly wikidata will be more useful for queries than raw mediawiki??? See above. I'm certain I am missing something, please clarify. This is currently standing in the way of our GSoC student completing his summer project - due next week. A little disappointing for him.. Sorry, but we have never hidden the fact that our query interface is not ready yet. wbsearchentities is a label lookup designed for find-as-you-type suggestions. It's not a query interface, and was never supposed to be. I understand the disappointment, but there is little we can do about this now. All I can suggest is working from a dump right now (and sadly, we only have mediawiki's raw json-in-xml dumps at the moment. I'm working on native JSON and RDF dumps, but they are not ready). -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] BREAKING CHANGE: Wikidata API changing top upper-case IDs.
Hi all. With today's deployment, the Wikibase API modules used on wikidata.org will change from using lower-case IDs (q12345) to upper-case IDs (Q12345). This is done for consistency with the way IDs are shown in the UI and used in URLs. The API will continue to accept entity IDs in lower-case as well as upper-case. Any bot or other client that has no property or item IDs hardcoded or configured in lower case should be fine. If however your code looks for some specific item or property in the output returned from the API, and it's using a lower-case ID to do so, it may now fail to match the respective ID. There is potential for similar problems with Lua code, depending on how the data structure is processed by Lua. We are working to minimize the impact there. Sorry for the short notice. Please test your code against test.wikidata.org and let us know if you find any issues. Thanks, Daniel PS: issue report on bugzilla: https://bugzilla.wikimedia.org/show_bug.cgi?id=53894 ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] IRI-value or string-value for URLs?
Am 03.09.2013 11:50, schrieb Lydia Pintscher: On Mon, Sep 2, 2013 at 11:56 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: OK, based on the discussion so far, we will add the data type to the snak in the external export, and keep the string data value for the URL data type. That should satisfy all use cases that have been brought up. Just so I know what's coming: Is this doable for the deployment in a week? If we push back something else, yes. But I think this is mainly useful in JSON dumps - which we don't have yet. Not hard to do, but won't happen in a week. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] IRI-value or string-value for URLs?
Am 30.08.2013 17:21, schrieb Denny Vrandečić: I do see an advantage of stating the property datatype in a snak in the external JSON representation, and am trying to understand what prevents us from doing so. Not much, the SnakSerializer would need access to the PropertyDataTypeLookup service, injected via the SerializerFactory. SnakSerializer already has: // TODO: we might want to include the data type of the property here as well -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Is assert() allowed?
Am 31.07.2013 13:42, schrieb Tim Starling: We could have a library of PHPUnit-style assertion functions which throw exceptions and don't act like eval(), I would be fine with that. Maybe MWAssert::greaterThan( $foo, $bar ) or something. I like that! Should support an error message as an optional parameter. I suppose we could just steal the method signatures from phpunit. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-tech] Jenkins failing for no good reason.
There seems to be an issue with Jenkins. It appears to use an old version of other extensions under some circumstances. It's like this: If you submit change 33 for extension A, which needs change 44 in extension B (which isn't merged yet), jenkins will fail correctly fail. BUT: When change 44 got merged into extension B, and you force Jenkins to re-run (e.g. by rebasing change 33), it will *still* fail, apparently using an old version of extension B. It seems this is only the case for the testextensions-master job, not the standalone repo and client jobs. Here are some examples: https://gerrit.wikimedia.org/r/#/c/72962/ fails for no good reason https://integration.wikimedia.org/ci/job/mwext-Wikibase-testextensions-master/3099/console https://gerrit.wikimedia.org/r/#/c/73772/ fails for no good reason https://integration.wikimedia.org/ci/job/mwext-Wikibase-testextensions-master/3093/console Please gather more evidence/insights if you come across this issue. Thanks, Daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] WikiVOYAGE deployment plan
Am 28.06.2013 11:45, schrieb Denny Vrandečić: * Wikipedia will not automatically and suddenly display links to Wikivoyage. The behavior on Wikipedia actually remains completely unchanged by this deployment. Let's make sure we have thorough tests for this, I'm not 100% sure how this is currently handled on the client. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Representing invalid Snaks
A quick follow up to this morning's mail: I discussed this issue with Denny for a while, and we came up with this: * I'll explore the possibility of using a BadValue object instead of a BadSnak, that is, model the error on the DataValue level. My initial impression was that this would be more work, but I'm no longer sure, and will just try and see. * We will represent the error as a string inside the BadValue/BadSnak object. There seem to be no immediate benefits or obvious use cases for wrapping that in an Error object. (This in reply to an earlier discussion on Gerrit). -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Getting the property (data) type of a PropertyValueSnak.
Am 13.06.2013 06:38, schrieb Jeroen De Dauw: Hey, Putting the DataType id in PropertyValueSnaks at this point seems like a bad idea for several reasons. Doing so would cost us quite some work end end up with a more complicated system as foundation. Changing it now would be hard. But I think it would have been simpler and cleaner if we had gone that route from the start. Why would it be a bad idea? To me, it's just a self-contained data structure that knows it's own type, as it should. If you have a use case for which the current code is not well suited, I suggest writing new code for that specific use case. I strongly suspect this will both be simpler and less work. Any code I can write for this now will involve injecting knowledge about properties into the snaks post-hoc. That's going to suck. I remember a lengthy discussion about this, but I don't recall the outcome (yes, we really need to write this stuff down). There was no decision at any point to change this, though it indeed has been brought up before. Well, at some point, the decision was made, right? Was it disucssed? Were the implications of each approach compared? Is this documented somewhere? I recall a lengthy skype call with Markus and Denny about this, and I *seem* to recall that we decided to store the type in the snaks - but as too often, i don't think this is documented anywhere. So, what's the point of whining now, since it's too late anyway? I'd like to understand the rationale for going with the current system. And I would like to make the case for more communication and documentation about design decisions like this. Especially since anything concerning the internal data tsructure that goes into the DB is very hard to change later. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] Getting the property (data) type of a PropertyValueSnak.
Am 13.06.2013 03:22, schrieb Daniel Werner: -1 Had to deal with this in the frontend as well and don't think this is inconvenient. It seems like the cleanest approach. Polluting the Snaks with information like this for performance or convenience reasons will probably cause more trouble in the end than keeping it as simple and pure as possible. You think that giving a data structure information about its type is polluting it? Why so? This seems pretty basic and streight forward to me. -- daniel ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech