[Wikidata-bugs] [Maniphest] T270764: Wikidata Truthy dump is missing important metadata triples

2023-10-02 Thread mkroetzsch
mkroetzsch added a comment. @Lydia_Pintscher Are you asking about the discrepancy in the counts, or about the general idea of this issue report? I must admit thatI do not get the significance of the SPARQL queries above. The missed properties seem to exist and work as expected

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints

2020-02-08 Thread mkroetzsch
mkroetzsch added a comment. In T244341#5862287 <https://phabricator.wikimedia.org/T244341#5862287>, @Jheald wrote: > Please don't think or refer to the blank nodes as "unknown values". I fully agree. The use of the word "unknown" in the UI was a mista

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints

2020-02-07 Thread mkroetzsch
mkroetzsch added a comment. Hi, Using the same value for "unknown" is a very bad idea and should not be considered. You already found out why. This highlights another general design principle: the RDF data should encode meaning in structure in a direct way. If two triples hav

[Wikidata-bugs] [Maniphest] [Commented On] T216842: Specify license of Wikibase ontology

2019-02-24 Thread mkroetzsch
mkroetzsch added a comment. CC0 seems to be fine. Using the same license as for the rest seems to be the easiest choice for everybody. TASK DETAIL https://phabricator.wikimedia.org/T216842 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T112127: [Story] Move RDF ontology from beta to release status

2018-10-17 Thread mkroetzsch
mkroetzsch added a comment. Well, for classes and properties, one would use owl:equivalentClass and owl:equivalentProperty rather than sameAs to encode this point. But I agree that this will hardly be considered by any consumer.TASK DETAILhttps://phabricator.wikimedia.org/T112127EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread mkroetzsch
mkroetzsch added a comment. This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-03 Thread mkroetzsch
mkroetzsch added a comment. Hi, The code is here: https://github.com/Wikidata/QueryAnalysis It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential

[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2017-12-16 Thread mkroetzsch
mkroetzsch added a comment. I agree with Stas: regular data releases are desirable, but need further thought. The task is easier for our current case since we already know what is in the data. For a regular process, one has to be very careful to monitor potential future issues. By releasing

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-10 Thread mkroetzsch
mkroetzsch added a comment. @AndrewSu As I just replied to Benjamin Good in this matter, it is a bit too early for this, since we only have the basic technical access as of very recently. We have not had a chance to extract any community shareable data sets yet, and it is clear

[Wikidata-bugs] [Maniphest] [Commented On] T126862: Datatype for chemical formulae on Wikidata

2016-02-25 Thread mkroetzsch
mkroetzsch added a comment. Re parsing strings: You are skipping the first step here. The question is not which format is better for advanced interpretation, but which format is specified at all. Whatever your proposal is, I have not seen any //syntactic// description of if yet

[Wikidata-bugs] [Maniphest] [Commented On] T127929: [Story] Add a new datatype for linking to creators of artwork and more (smart URI)

2016-02-24 Thread mkroetzsch
mkroetzsch added a comment. +1 sounds like a workable design TASK DETAIL https://phabricator.wikimedia.org/T127929 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Aklapper, daniel, Steinsplitter, Lydia_Pintscher, Izno

[Wikidata-bugs] [Maniphest] [Updated] T126862: Datatype for chemical formulae on Wikidata

2016-02-15 Thread mkroetzsch
mkroetzsch added a comment. Re chemical markup for semantics: this is true for Wikitext, where you cannot otherwise know that "C" is carbon. It does not apply to Wikidata, where you already get the same information from the property used. Think of https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Commented On] T126862: Datatype for chemical formulae on Wikidata

2016-02-15 Thread mkroetzsch
mkroetzsch added a comment. I really wonder if the introduction of all kinds of specific markup languages in Wikidata is the right way to go. We could just have a Wikitext datatype, since it seems that Wikitext became the gold standard for all these special data types recently. Mark-up over

[Wikidata-bugs] [Maniphest] [Commented On] T126349: RDF export for the math data type should not export input texvc string but its MathML representation

2016-02-10 Thread mkroetzsch
mkroetzsch added a comment. > The MathML expression includes the TeX representation, which can be used in > LaTeX documents and also to create new statements. That would address the conversion back from MathML to TeX. With this in place, we could indeed use MathML in JSON and RDF, if we

[Wikidata-bugs] [Maniphest] [Commented On] T126349: RDF export for the math data type should not export input texvc string but its MathML representation

2016-02-10 Thread mkroetzsch
mkroetzsch added a comment. The format should be the same as in JSON. If MathML is preferred there, then this is fine with me. If LaTeX is preferred, we can also use this. It seems that MathML would be a more reasonable data exchange format, but Moritz was suggesting in his emails that he does

[Wikidata-bugs] [Maniphest] [Commented On] T99820: [Task] Add reference to ontology.owl to the RDF output

2015-11-23 Thread mkroetzsch
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T99820#1820662, @daniel wrote: > Looking at the link, it seems to me we'd (trivially) meet these requirements. Yes, that's what I meant. :-) > But I'm not sure about the fine details, e.g. regarding the versi

[Wikidata-bugs] [Maniphest] [Commented On] T99820: [Task] Add reference to ontology.owl to the RDF output

2015-11-19 Thread mkroetzsch
mkroetzsch added a comment. > ...and if we consider our data dump to be an ontology, then what isn't an > ontology? The word "ontology" has different meanings in different contexts. Here, we only mean the notion of "ontology" meant by the term owl:ontology as use

[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model

2015-11-18 Thread mkroetzsch
mkroetzsch added a comment. I don't want to detail every bit here, but it should be clear that one can easily eliminate the dependency to $db in the formatter code. The Sites object I mentioned is an example. It is *not* static in our implementation. You can make it an interface. You can

[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model

2015-11-18 Thread mkroetzsch
mkroetzsch added a comment. @daniel As long as it works for you, this is all fine by me, but in my experience with PHP this could cost a lot of memory, which could be a problem for the long item pages that already caused problems in the past. > But it requires the serialization and formatt

[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model

2015-11-18 Thread mkroetzsch
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. Structurally, this would work, but it seems like a very general solution with a lot of overhead. Not sure that this pattern works well on PHP, where the cost of creating additional objects is huge. I also wonder whether

[Wikidata-bugs] [Maniphest] [Commented On] T113168: [Story] Make it possible to alter only Statements with a certain property

2015-09-22 Thread mkroetzsch
mkroetzsch added a comment. This was a suggestion we came up with when discussing during WikiCon. People are asking for a way to edit the data they pull into infobox templates. Clearly, doing this in place will be a long-term effort that needs a complicated solution and many more design

[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping

2015-09-11 Thread mkroetzsch
mkroetzsch added a comment. Note that this discussion is no longer just about the wdt property values (called "truthy" above). Simple values are now used on several levels in the RDF encoding. In general, the same argument as for coordinates applies: if we cannot do it right, t

[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping

2015-09-11 Thread mkroetzsch
mkroetzsch added a comment. If we could distinguish type quantity properties that require a unit from those that do not allow units, there would be another options. Then we could use a compound value as the "simple" value for all properties with unit to simulate the missin

[Wikidata-bugs] [Maniphest] [Updated] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping

2015-09-11 Thread mkroetzsch
mkroetzsch added a comment. I think the discussion now lists all main ideas on how to handle this in RDF, but most of them are not feasible because of the very general way in which Wikibase implements unit support now. Given that there is no special RDF datatype for units and given that we

[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)

2015-09-10 Thread mkroetzsch
mkroetzsch added a comment. Including more data (within reason) will not be a problem (other than a performance/bandwidth problem for your servers). However, if there are further ideas and small improvements that will take time to implement, it would be good to switch to "dump" as t

[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)

2015-09-09 Thread mkroetzsch
mkroetzsch added a comment. Data on the referenced entities does not have to be included as long as one can get this data by resolving these entities' URIs. However, some basic data (ontology header, license information) should be in each single entity export. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)

2015-09-09 Thread mkroetzsch
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. One the mailing list, Stas brought up the question "which RDF" should be delivered by the linked data URIs by default. Our dumps contain data in multiple encodings (simple and complex), and the PHP code can crea

[Wikidata-bugs] [Maniphest] [Commented On] T85444: get Wikidata added to LOD cloud

2015-09-08 Thread mkroetzsch
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. As another useful feature, this will also allow us to have our SPARQL endpoint monitored at http://sparqles.ai.wu.ac.at/ Basic registration should not be too much work; please look into it (I don't want to create an account

[Wikidata-bugs] [Maniphest] [Commented On] T73349: [Bug] Fix empty map serialization behaviour

2015-08-24 Thread mkroetzsch
mkroetzsch added a comment. It seems that the Web API for wbeditentities is also returning empty lists when creating new items (at least on test.wikidata.org). Is this the same bug or a different component? TASK DETAIL https://phabricator.wikimedia.org/T73349 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T105432: Drop wikibase:quantityUnit for now from RDF dump

2015-08-05 Thread mkroetzsch
mkroetzsch added a comment. If not dropped, then it should be fixed. The value of 1 (a string literal) is not correct. Units should be represented by URIs, not by literals. TASK DETAIL https://phabricator.wikimedia.org/T105432 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T102717: https switch changed wdata prefix to https:

2015-06-23 Thread mkroetzsch
mkroetzsch added a comment. While I did say that pretty much all URIs I know use http, I do not have any reason to believe that https would cause problems. It is not so extensively tested maybe, but in most contexts it should work fine. A bigger issue is that some people are already using our

[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-06-17 Thread mkroetzsch
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T95316#1373937, @Lydia_Pintscher wrote: Are there any differences we're missing? Are we ok with these differences? I will do a complete review of the update RDF mapping in the course of the next week. I will report back

[Wikidata-bugs] [Maniphest] [Commented On] T102155: find a way to surface rdf/json representation in item UI

2015-06-12 Thread mkroetzsch
mkroetzsch added a comment. we once planned a popup box with links to the various formats. It would be shown when you click on the Q-id in the title. A pop-up box is a good solution if there are several options, but the Qid is not a good place to trigger it, since it gives no hint

[Wikidata-bugs] [Maniphest] [Commented On] T101752: Introduce ExternalEntityId

2015-06-10 Thread mkroetzsch
mkroetzsch added a comment. I think this is a useful change if you want Wikibase sites to be able to refer to other Wikibase sites. In WDTK, all of our EntityId objects are external, of course. A lesson learned for us was that it is not enough to know the base URI in all cases. You sometimes

[Wikidata-bugs] [Maniphest] [Commented On] T99907: Human-readable serialization of TimeValue precisions in RDF

2015-05-21 Thread mkroetzsch
mkroetzsch added a comment. A big advantage of the numbers is that you can search for values where the precision is at least a certain value (e.g., dates with precision day or above). This would be lost when using URIs. TASK DETAIL https://phabricator.wikimedia.org/T99907 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-05-19 Thread mkroetzsch
mkroetzsch added a comment. @Jc3s5h You are right that date conversion only makes sense in a certain range. I think the software should disallow day-precision dates in prehistoric eras (certainly everything before -1). There are no records that could possibly justify this precision

[Wikidata-bugs] [Maniphest] [Commented On] T97195: Create real URLs for wikidata ontology

2015-05-11 Thread mkroetzsch
mkroetzsch added a comment. Sounds good. I am not aware of any best practice re http vs. https but all URIs I know are using http as a protocol. TASK DETAIL https://phabricator.wikimedia.org/T97195 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T94747: Make decision on RDF ontology prefix

2015-04-03 Thread mkroetzsch
mkroetzsch added a comment. I agree with the proposal of @Smalyshev. TASK DETAIL https://phabricator.wikimedia.org/T94747 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T94747: Make decision on RDF ontology prefix

2015-04-03 Thread mkroetzsch
mkroetzsch added a comment. @daniel Changing the base URIs is not working as a way to communicate breaking changes to users of RDF. You can change them, but there is no way to make users notice this change, and it will just break a few more queries. It's just not how RDF works. Most of our

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-31 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev You comment on my Item 1 by referring to BlazeGraph and Virtuoso. However, my Item 1 is about reading Wikidata, not about exporting to RDF. Your concerns about BlazeGraph compatibility are addressed by my item 2. I hope this clarifies this part

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-30 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev Re halting the work on the query engine/produce code now: The WDTK RDF exports are generated based on the original specification. There is no technical issue with this and it does not block development to do just this. The reason we are in a blocker

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-30 Thread mkroetzsch
mkroetzsch added a comment. @mkroetzsch I already listed a few of the tools that implement XSD 1.0 style BCE years and I read your answer as to say that you know of no tools that implement XSD 1.1 style BCE years. Then you misread my answer. Almost all tools that exist today use the 2000

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-30 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev We really want the same thing: move on with minimal disturbance as quickly as possible. As you rightly say, the data we generate right now is not meant for production use but for testing. We must make sure that our production environment will understand

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-30 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev P.S. Your finding of years in our Virtuoso instance is quite peculiar given that this endpoint is based on RDF 1.0 dumps as they are currently generated in WDTK using this code: https://github.com/Wikidata/Wikidata-Toolkit/blob

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-29 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev @Lydia_Pintscher Dates without years should not be allowed by the time datatype. They are impossible to order, almost impossible to query, and they do not have any meaning whatsoever in combination with a preferred calendar model. All the arguments @Denny

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-29 Thread mkroetzsch
mkroetzsch added a comment. @mkroetzsch Do you know of some widely used software that implements XSD 1.1 handling of BCE dates? Many applications that process dates are based on ISO rather than on XSD. Java's SimpleDateFormat class, for example, is based on ISO and thus interprets year

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed in wikibase but has no meaning as xsd:dateTime

2015-03-27 Thread mkroetzsch
mkroetzsch added a comment. Note that all current data representation formats assume that -01-01T00:00:00 is a valid representation: - XML Schema 1.1: http://www.w3.org/TR/xmlschema11-2/#dateTime - RDF 1.1: http://www.w3.org/TR/rdf11-concepts/#section-Datatypes - OWL 2: http://www.w3.org/TR

[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export

2015-03-27 Thread mkroetzsch
mkroetzsch added a comment. Don't see why it would be this many. It'd be like 4 additional rows per property: I was referring to the labels. For some use cases, it could be convenient of each of the property variants would also have the rdfs:label of the property item. For example, RDF

[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-27 Thread mkroetzsch
mkroetzsch added a comment. we don't know what year it was but it was July 4th Ouch. Where has this been designed? Can you point to the specification of this? @Denny, is this intended? Dates without a year are extremely hard to handle in queries and don't work at all like the normal dates

[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export

2015-03-27 Thread mkroetzsch
mkroetzsch added a comment. All RDF tools should be able to handle resources without labels (no matter if used as subject, predicate, or objcet). But data browsers or other UIs will simply show the URL (or an automatically created abbreviated version of it) to the user. So instead of instance

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T94019: Generate RDF from JSON

2015-03-27 Thread mkroetzsch
mkroetzsch added a subscriber: mkroetzsch. TASK DETAIL https://phabricator.wikimedia.org/T94019 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph

2015-03-27 Thread mkroetzsch
mkroetzsch added a subscriber: Lydia_Pintscher. mkroetzsch added a comment. Yes, the discussion on SPARQL has converged surprisingly quickly to the view that XSD 1.1 is both normative and intended in SPARQL 1.1 (by the way, I can only recommend this list if you have SPARQL questions

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-26 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev Yes, this is what I was saying. @hoo was proposing to create a special directory for truthy based on offline discussion in the office. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files

[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export

2015-03-26 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev Yes, using lower-case local names for properties is a widely used convention and we should definitely follow that for our ontology. However, I would rather not change case of our P1234 property ids when they occur in property URIs, since Wikibase ids

[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. @daniel Changing URIs of the ontology vocabulary is silently producing wrong results as well. I understand the problems you are trying to solve. I am just saying that changing the URIs does not actually solve them. @adrianheine You are right. My example was less

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. @hoo Thanks for the heads up! I do have comments. (1) I would remove the full and truthy distinction from the path and rather make this part of the dump type (for example statements and truthy-statements). The reason is that we have many full dumps (terms

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. @Lydia_Pintscher I understand this problem, but if you put different dumps for different times all in one directory, won't this become quite big over time and hard to use? Maybe one should group dumps by how often they are created (and have date-directories only

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. All of these dumps will be generated by exporting from the DB. Why would one want to do this? The JSON dump contains all information we need for building the other dumps, and it seems that the generation from the JSON dump is much faster, avoids any load on the DB

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev Re what does consistent mean: to be based on the same input data. All dumps are based on Wikidata content. If they are based on the same content, they are consistent, otherwise they are not. Re discussing RDF dump partitioning in https

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment. @JanZerebecki: Re using the same code: That's not essential here. All we want is that the dumps are the same. It's also not necessary to develop the code twice, since it is already there twice anyway. It's just the question if we want to use a slow method

[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export

2015-03-22 Thread mkroetzsch
mkroetzsch added a comment. is there any existing ontology we may want to use to create such links between entity:P1234 and v:P1234 or q:P1234? Or should we just invent our own? We would have to make new URIs here. This depends on which/how many variants of RDF property URIs we use: we

[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export

2015-03-21 Thread mkroetzsch
mkroetzsch added a comment. Also, it was suggested that we may want to change the fact that we use entity:P1234 in link Entity-Statement and give it a distinct URL. However, then it is not clear what would be the link between entity:P1234 and the rest of the data. This is a good point

[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology

2015-03-20 Thread mkroetzsch
mkroetzsch added a comment. @daniel It makes sense to use wikibase rather than wikidata, but I don't think it matters very much at all. We should just define it rather sooner than later. As for the versioning, I don't see how to convince you. Four more attempts: - Try to apply your proposal

[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology

2015-03-19 Thread mkroetzsch
mkroetzsch added a comment. @daniel: Have you wondered why XML Schema decided against changing their URIs? It is by far the most disruptive thing that you could possibly do. Ontologies don't work like software libraries where you download a new version and build your tool against it, changing

[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology

2015-03-19 Thread mkroetzsch
mkroetzsch added a comment. Hi Daniel. Good point, I agree that this should change. A URL based on wikiba.se seems to be the best. I don't think we need to worry about domain ownership here (why would anybody sell this domain? Is it not WMF-owned?) I think it is not a good idea to change

[Wikidata-bugs] [Maniphest] [Created] T91117: Empty JSON maps serialized as empty lists in XML dumps

2015-02-27 Thread mkroetzsch
mkroetzsch created this task. mkroetzsch added subscribers: mkroetzsch, Lydia_Pintscher. mkroetzsch added a project: Wikibase-DataModel-Serialization. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The XML dumps of Wikidata contain many JSON serialization errors where

[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item

2015-02-20 Thread mkroetzsch
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T89949#1052731, @daniel wrote: Nik tells me that the HA features in Virtuoso are only available in the closed source enterprise version. That basically means WMF is not going to use it in production. Yes, I guessed

[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item

2015-02-19 Thread mkroetzsch
mkroetzsch added a comment. The RDF should certainly contain information about the entity type of exported data. This is essential to ensure that the RDF data contains all the information that is found in the JSON (other than the ordering). As I read it, things that are of rdf:type Item

[Wikidata-bugs] [Maniphest] [Updated] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item

2015-02-19 Thread mkroetzsch
mkroetzsch added a comment. Our primary goal is to encode the JSON information in RDF, and possibly to enrich this information where it makes sense in an RDF-context (e.g., by adding links to other datasets). The JSON data includes the entity type, so it is clear that we want to encode

[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item

2015-02-19 Thread mkroetzsch
mkroetzsch added a comment. Thanks for adding Denny. Long reply, but details matter here. I agree that there are different things one could talk about (document, real thing). However, for now I am mainly interested in talking about the latter, since this should be our primary concern

[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item

2015-02-19 Thread mkroetzsch
mkroetzsch added a comment. Now my reply was so long that the ticket has already been closed in the meantime :-D Anyway, those are my two (or more) cents on this topic ;-) I don't think the paper goes into these topics very much (as they are not so much technical as philosophical). TASK

[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread mkroetzsch
mkroetzsch added a comment. I think json should be in the path somewhere. It does not have to be at the top-level, but it would not be good if dump files of one type end up in their own directory. The only way for tools to detect and download dumps automatically is to look at the HTML

[Wikidata-bugs] [Maniphest] [Commented On] T86524: use data model implementation for import

2015-01-12 Thread mkroetzsch
mkroetzsch added a comment. I don't know about the details of the import task discussed here, but for the record: we are happy to support this use of WDTK by helping to update our implementation where necessary. TASK DETAIL https://phabricator.wikimedia.org/T86524 REPLY HANDLER ACTIONS

[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store

2015-01-11 Thread mkroetzsch
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T86278#969184, @Multichill wrote: I would like to turn it around. We should support indexing everything: ... The fact that we're not creative enough to make up queries for everything doesn't mean it isn't useful. I have

[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store

2015-01-11 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev My suggestion was just about the surface appearance, not about the inner workings. I am saying that the following two phrases have the same structure: - Find things with a *sitelink* that *has badge* *featured*. - Find things with a *population* that has

[Wikidata-bugs] [Maniphest] [Updated] T86278: Define which data the query service would store

2015-01-11 Thread mkroetzsch
mkroetzsch added a comment. This is not correct, original structure can be recovered Then I misunderstood the transformation that was proposed. My impression was that a statement with three qualifier snaks: P1 V1, P1 V2, https://phabricator.wikimedia.org/P2 V3 would be stored as two

[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store

2015-01-11 Thread mkroetzsch
mkroetzsch added a comment. @JanZerebecki I understand what you are saying about what indexing means here. Makes sense to me. What you are saying about my example query sounds as if you are planning to implement query execution manually. I hope this is not the case and you can just give

[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store

2015-01-11 Thread mkroetzsch
mkroetzsch added a comment. @Smalyshev My point is merely that sitelinks and labels //can// be handled like statements. Since statements must be supported anyway, it would be sensible to reuse the data structures and query expressions defined for them. I don't think that confusion is likely