Re: [Wikidata-l] Number of planets in the solar system
@Thomas is close to the right answer. Nothing about Pluto changed, it was the definition of Planet that is changed so you need two different definitions of Planets, but note that the definitions of themselves are somewhat timeless, so you are really pointing to some specific definition of a planet in either case. There is no reason why this is not practical. It is just a matter of putting in another type, and maintenance is not a tough problem since there are fewer than 10 of them. There could be some need for vocabulary to describe the attributes of the definitions, but simply a link to a defining document is good enough from the viewpoint of grounding. On Thu, Apr 30, 2015 at 12:20 PM, Thomas Douillard thomas.douill...@gmail.com wrote: It may not be practical, but it is still possible ;) classes like ''astronomic corp that was thought to be a planet in 1850'' are an option :) 2015-04-30 13:51 GMT+02:00 Andrew Gray andrew.g...@dunelm.org.uk: On 30 April 2015 at 12:37, Thomas Douillard thomas.douill...@gmail.com wrote: Infovarius even complicated the problem, he put the number of known planets at some time with a qualifier for validity :) Just to throw a real spanner in the works: for a lot of the nineteenth century the number varied widely. The eighth planet was discovered in 1801, and is what we'd now think of as the asteroid or dwarf planet Ceres; the real eighth planet, Neptune, wasn't discovered until 1851. Newly discovered asteroids were thought of as 'planets' for some time (I have an 1843 schoolbook somewhere that confidently tells children there were eleven planets...) until by about 1850, it became clear that having twenty or so very small planets with more discovered every year was confusing, and the meaning of the word shifted. There was no formal agreement (as was the case in 2006) so no specific end date. The moral of this story is probably that trying to express complex things in Wikidata is not always practical :-) -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254paul.houle on Skype ontolo...@gmail.com https://legalentityidentifier.info/lei/lookup http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Preliminary SPARQL endpoint for Wikidata
This is a great development! I managed to run some simple queries, but I am having trouble with profiling-type queries such as select ?p (count(*) as ?cnt) { ?s ?p ?o} group by ?p order by desc(?cnt) You can generally run those O.K. on the DBpedia SPARQL endpoint. It would be nice to see a few more horsepower put behind this. On Wed, Apr 8, 2015 at 7:11 PM, Nicola Vitucci nicola.vitu...@gmail.com wrote: Il 08/04/2015 23:43, Markus Krötzsch ha scritto: On 08.04.2015 17:24, Nicola Vitucci wrote: Il 08/04/2015 16:36, Markus Krötzsch ha scritto: On 08.04.2015 15:07, Nicola Vitucci wrote: Hi Markus, would you recommend to add some sort of patch until the new dumps are out, either in the data (by adding some triples to a temporary graph) or just in the Web interface for the external links? If you need it ASAP, you could actually just implement it in our Java code and make a pull request. It should not be too much effort. You can use the issue to ask about details if you are not sure what to do. Otherwise the ETA would be end of April/beginning of May (several other RDF extensions are currently being worked on and will happen first, e.g., ranks in RDF). I don't need it right now, so given the short ETA I'd wait. Anyway, in order to let people use external links more easily, I could just manually drop the last letter (or apply any other rule) only in the href links for now, while leaving the URIs intact. Do you see any harm in this? Ah, you mean for dispayling links in HTML? No, there is no harm in this at all. Most likely, the final life exports will also redirect to the property entity exports (which would make most sense for LOD crawlers). Indeed. I made this temporary change on WikiSPARQL, so that links like in Jean-Baptiste's examples may work properly. If you try this: http://wikisparql.org/sparql?query=DESCRIBE+%3Chttp%3A//www.wikidata.org/entity/Q18335803%3E and then click on the external link on any property, now you should be redirected to the right wiki page. Nicola ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254paul.houle on Skype ontolo...@gmail.com https://legalentityidentifier.info/lei/lookup http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Call for development openness
, quality is in the eye of the end user, so it only happens if you have a closed feedback loop where the end user has a major influence on the behavior of the producer and certainly the possibility of making more money if you do a better job and (even more so) going out of business if you fail to do so is a powerful way to do it. The trouble is that most people interested in open data seem to think their time is worth nothing and other people's time is worth nothing and aren't interested in paying even a small amount for services so the producers throw stuff that almost works over the wall. I don't think it would be all that difficult for me to do for Wikidata what I did for Freebase but I am not doing it because you aren't going to pay for it. [... GOES BACK TO WORK ON A SKUNK WORKS PROJECT THAT JUST MIGHT PAY OFF] On Fri, Feb 20, 2015 at 8:09 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, I have waited for some time to reply. FIrst of all. Wikidata is not your average data repository. It would not be as relevant as it is if it were not for the fact that it links Wikipedia articles of any language to statements on items. This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live. I am delighted to learn that a production grade replacement of WDQ is being worked on. I am delighted that a front-end (javascript) ? developers is being sought. That is what it takes to bring the sum of al knowledge to all people. It is in enriching the data in Wikidata not in yet another pet project where we can make a difference because that is what the people will see. When SPARQL is available with Wikidata data.. do wonder how you would serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is challenged in this way ? Thanks, GerardM On 18 February 2015 at 21:25, Paul Houle ontolo...@gmail.com wrote: What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood. I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information. Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for data wikis in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways. Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon. On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing. For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community. I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved
Re: [Wikidata-l] Call for development openness
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood. I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information. Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for data wikis in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways. Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon. On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing. For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community. I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that. [0] http://wikiba.se/ [1] http://wikiba.se/components/ Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Tools for Wikidata users
I like this stuff and I'd like to imagine what the cycle of improving these looks like. I played around with the admin interface a little, but it wasn't clear how I could add a property that is defined in Wikidata, such as the electronegativity or the oxidization states. There is also the issue that other properties that I don't see populated for Wikidata here such as boiling point, melting point, etc. Right now I think the 3d display modes are flashy but don't deliver real value. With more data, however, you could do some interesting things. Also I am curious about the data model, it looks like a property here is something like a relational table row, is it? ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Precision of globe coordinates
Big picture it is important to recognize the importance of geodetic datums. The market leading datum is WGS84 or GPS because it is (1) good enough for military work, and (2) GPS hardware is everywhere. Before the space age, geodetic datums were determined by optical observations forming a network across a region. Islands far from the coast would get their own datum because the position of the island itself relative to the mainland is uncertain. In today's globalized world we have more reasons for datums. I often see road crews using specialized GPS equipment with large antennas. These systems establish a datum around base station(s) which can be precise to the centimetric range in the most advanced systems. Measurements made with that kind of system will be NOT be precisely comparable with measurements made in other places, but you could dumb down the claim to WGS84 because it would still be that good. Then there is quality in the sense of conformance to requirements; we might muck with some coordinates to make data useful. For instance, if you use a quality hand GPS to survey stars on the Hollywood Walk of Fame and then plot the points on Google maps, you discover two MUST requirements are missing (i) the stars are all in the correct order, and (ii) on the correct side of the street which from the viewpoint of a pedestrian is a lot more important than the fact that my images of Hollywood and Vine are rotated a bit relative to the Big G's. Thus, an augmented reality mobile app for the Hall of Fame would require its own geodetic datum. I spend more time walking in the woods than I do in L.A., and in the woods there are similar but different concerns. If you walk on a path that closely follows a creek, for instance, a GPS trace may not agree with the actual sequence of creek crossings -- something that would drive me nuts if I was using a map while hiking in the woods. You're supposed to fix topological problems like that when you upload to Open Street Maps, so that is another sense of a privileged datum. On Tue, Jan 13, 2015 at 10:15 AM, Serge Wroclawski emac...@gmail.com wrote: For places where precision is required, has anyone given thought to using Geohash keys rather than lat/lon? The benefits of geohashing here is that you get both the location and also the precision values in a single object. As a secondary benefit, it can be used to index on a database with more ease than a standard lat/lon coordinate pair. - Serge ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] A look at Wikidata
I fed the Wikidata dump into a JSON profiling tool; in the first stage it identified unique paths one could follow through the JSON data structures. The table below shows a count of the literal data items that can be found behind a path -- we're not counting how many claims of P31[] have been made, we are also counting all of the literals inside the node, so the more information that is qualifying the claim the bigger this number gets. /claims/P31[]144350720 instance of/claims/P625[]35948377 geographic coordinates/claims/P17[]35165614 country: sovereign state/claims/P646[]31095359 freebase identifier/claims/P569[]30881885 date of birth/claims/P21[]30466476 sex or gender/claims/P105[] 29234406 taxon rank/claims/P225[]27808194 taxon name/claims/P131[]27806448 located in administrative div /claims/P171[]25159278 parent taxon None of those are a surprise at all: the two great hierarchies (spatial and biological) are represented and there are properties about people, oddly though the most documented property connected with creative works is P161, which ranks in at #20. Anyhow, it is not all claims, if you look at the highest level you see /datatype1328/id16647896/type16647896/aliases17824992/sitelinks82865847 /descriptions112796452/labels120721644/claims772152821 Everything above the /claims is part of what I have been calling the taxonomic core.There are quite a few reasons to treat this data specially, and I'd guess this solved a chicken-vs-egg problem for WD. In Freebase the taxonomic core is roughly half the mass of the whole thing. The claims are certainly bulked up in Wikidata because of the qualifying information. If anything is weak about the fundamental data model it is that aliases and labels are not reified the way the claims are. This is a big deal if you want a usable lexical database. For instance, labels should be taggable as to * being potentially offensive (i.e. insults that start with N) * generic name for drug/brand names for drug * Japanese labels should be available in kanji, hiragana, and romanized form and should be identifiable that way * in English we have it easy and you can generate Mad Lib style texts correctly if you can (1) know which article to use and (2) how to make both the plural and singular forms. (1) is easy to guess if you have semantic data and you can get away with being imperfect at (2). * for German however you need to tag by grammatical gender and the choice of the article is a function of said gender and the relationship between the concept and the predicate as well as the verb tense * similar things exist for most of the other languages... * various organizations have defined viewpoints on terminology; for instance firefighters want you to say 'flammable' because people might get the morphology wrong on 'inflammable'; in the army you could be sexually harassed if you call your Rifle your Gun -- Paul Houle (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] WikiData for Research Project Idea: Structured History
Gerard, tell me about it. It's hard to find anyone who has even seen ISO 8601 so there is not general compatibility between tools that accept ISO 8601 (date)?(times?); the xsd:datetime (defined mainly as a restriction of ISO 8601) is closer to an open standard, but people aren't so sure about extra digits in the date fields, but maybe we will need them to deal with the year 1 problem. IEEE 744 is a similar scandal since it hasn't been read by most developers, particularly systems developers, so it is unlikely that FP operations in your favorite language are completely conformant. Now IEEE does have the Get802 program which lets you get slightly aged documents for networking standards and ISO does release the occasional standard for free such as ISO 20222 but there is a big difference between those two and the other organizations like the OMG, W3C, IETF, and FIPS that publish standards for free and manage to somehow pay the bills. On Mon, Dec 29, 2014 at 6:31 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi. The fact that ISO has its standards behind a paywall is its shame. However, it does not necessarily imply anything about the use of the standard. Thanks, Gerard NB a paywall seriously hampers acceptance of standards On 29 December 2014 at 12:20, Jeff Thompson j...@thefirst.org wrote: The ISO standard for CIDOC CRM is behind a pay wall with a patent notice. Can it be used in an open knowledge system? On 2014-12-29 9:49, Dov Winer wrote: Hi Sam, CIDOC/CRM is the ontology of choice for Structured History as it is anchored on modelling events. An excellent project based on it is the ResearchSpace from the British Museum. See: http://www.researchspace.org/ http://www.researchspace.org/home/rsandcrm http://cidoc-crm.org/ Enjoy, Dov ___ Wikidata-l mailing listWikidata-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Public datasets available via Amazon
I have a copy of the Wikipedia Traffic statics with data in AWS up to Feb 2014 and I have scripts that could fill in the rest to produce a useful monthly product. I spent a month or two doing vendor management with AWS and eventually talked to the people in charge of the data sets and they didn't seem all that excited about the various dead data sets that haven't been updated such as old Freebase quad dumps or that old slice of Wikipedia stats, etc. It wasn't clear to him, however, that anybody would be interested in the data I proposed to publish and it didn't go any further. If you want this data, go ask the public data sets people for it. In the mean time if you want to get your AWS account authorized for it, please write me personally. On Fri, Nov 28, 2014 at 1:02 PM, Federico Leva (Nemo) nemow...@gmail.com wrote: Cristian Consonni, 28/11/2014 18:00: several public datasets are available (including Wikipedia's traffic stats). (Years old, and Amazon doesn't answer when offered updates. Cf. http://markmail.org/message/2efoheaeyikpzndy ) Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Infovore 3.1 and :BaseKB Gold 2
I'm bringing this up as my proof-by-construction answer to a knock-down-drag-out thread earlier where people complained about the difficulty of running queries against DBpedia and Wikidata. I think some people will find the product described below to be a faster road to where they are heading in the short term. In the longer term I am thinking a v4 or v5 infovore may be able to evaluate the contexts of facts in Wikidata and thus create a world view which can be quality controlled for particular outcomes. - Well, Infovore 3.1 happened quickly after Infovore because I made a quick attempt to get my Jena up to date and found it was easy to update, so I did. The importance here is that there is a lot of cool stuff going on with Jena, such as the RDFThrift serialization format, and also some Hadoop I/O tools written by Rob Vesse, and tracking the latest version helps us connect with that. Release page here: https://github.com/paulhoule/infovore/releases/tag/v3.1 Infovore 3.1 was used to process the Freebase RDF Dump to create a quality-controlled RDF data set called :BaseKB; generally queries look the same on Freebase and :BaseKB, but :BaseKB gives the right answers, faster, and with less memory consumption. This week's release is in the AWS cloud: s3://basekb-now/2014-11-09-00-00/ something very close to this is going to become :BaseKB Gold 2. This is simpler and better product that the last Gold release from Spring 2014. Here are a few reasons: * Unicode escape sequences in Freebase are now converted to Unicode characters in RDF * The rejection rate of triples has dramatically dropped, because of both changes to Infovore and improvements in Freebase content * The product is now packaged as a set of files partitioned and sorted on subject; this means you can download one file and get a sample of facts about a given topic; there is no longer the horizontal division Between duplicate fact filtering and compression, :BaseKB Now is nearly half the size of the Freebase RDF Dump. If you're interested please join the mailing list at https://groups.google.com/forum/#!forum/infovore-basekb ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata RDF
/about/id/entity/https/bugzilla.wikimedia.org/show_bug.cgi?id=48143 xhv:related https://twitter.com/hashtag/RDF#this https://twitter.com/hashtag/RDF#this ; is foaf:primaryTopic of http://linkeddata.uriburner.com/c/8GUHZ7 http://linkeddata.uriburner.com/c/8GUHZ7, http://bit.ly/vapour-report-sample-wikidata-issue-tracking-entity-http-uri http://bit.ly/vapour-report-sample-wikidata-issue-tracking-entity-http-uri . ## Nanotation End ## Links: [1] http://kidehen.blogspot.com/2014/07/nanotation.html -- Nanotation [2] http://linkeddata.uriburner.com/about/html/{url-of-this-reply-once-its-live} -- URL pattern that will show the effects (refied statements/claims amongst other things) of the nanotations above . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com http://legalentityidentifier.info/lei/lookup ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Commons Wikibase
I'd be particularly wary of inferring anything from the EXIF data, especially the time. I have a cheap digital camera which is pretty good except that the clock periodically resets to a default time. I've got a somewhat more expensive digital camera which has the same problem. I have an android tablet that I assume gets the time from the net and/or GPS, but when I took it ought of my gym bag the other day I noticed the time display had been switched to 24hrs and the time zone was switched to central. When I am in the photography habit, I keep the clock set on my cameras. Sometimes I fall out of the habit but something interesting happens and you'd better believe I am not going to waste time setting the clock if I get a chance to photograph a burning car! Similarly when travelling I might be bothered to set the timezone or not, more likely not if I have a layover in some place like Frankfurt or Schiphol airport. If somebody decided just to set the clock to Zulu I wouldn't blame them. Also, efforts to infer stuff from the EXIF data such as did the flash go off? rarely produce interesting results. For instance, it's a good habit to use the flash when you take photos of people outdoors on a bright day because it softens the shadows. Some people do it all the time and the auto mode on some cameras does it by default too. Thus, the flash is not an indicator that a photo was taken at night, indoors, in the dark, etc. If you filter on things like that, or the ISO level, or the exposure, or aperture, you're unlikely to get categories that are useful. On Wed, Aug 20, 2014 at 7:52 AM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: On 20.08.2014 10:46, Gerard Meijssen wrote: Hoi, When I add statements with is a list of, the item I refer to works as a base. It and all subsequent statements are required to be the result of the result that is generated by WDQ in the background. The results are shown automatically from within Reasonator. The hack is in having Reasonator interpret the limited expressions available. Then again, calling Reasonator a hack is a disservice to the real application it provides. Not sure what you refer to, but there might be a misunderstanding here. I was using the word hack in my email to refer to the proposal of using additional qualifiers to express queries in Wikidata. That was a new proposal in the email I replied to and had nothing to do with Reasonator or your annotations. Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Items without any label
Some of these come up to a page that says this record has been deleted, other ones have no english label but have labels in other languages http://www.wikidata.org/wiki/Q1209553 On Thu, Aug 14, 2014 at 7:55 AM, Andrew Gray andrew.g...@dunelm.org.uk wrote: Looks like a lot of these may be failed or simply not-yet-deleted merges, eg/ https://www.wikidata.org/w/index.php?title=Q1218589action=history https://www.wikidata.org/w/index.php?title=Q17485065action=history Andrew. On 14 August 2014 09:06, Lukas Benedix lukas.bene...@fu-berlin.de wrote: Hi, I found ~16.000 Items without any label. I have no idea how it's possible to create those and how to fix this problem, so here is the list for whoever can: http://tools.wmflabs.org/lbenedix/wikidata/items_without_any_label_20140811.txt Lukas ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata: CSV, Shapefile, etc.
I'm intensely interested in links to shapefiles from databases such as Wikidata, DBpedia and Freebase. In particular I'd like to get Natural Earth hooked up http://www.naturalearthdata.com/ It's definitely a weakness of current generic databases that they use the 'point GIS' model that is so popular in the social media world. On Tue, Aug 5, 2014 at 10:14 AM, Magnus Manske magnusman...@googlemail.com wrote: We don't have shapefiles yet, but a lot of property types such as geographic coordinates (as in, one per item, ideally...), external identifiers (e.g. VIAF), dates, etc. A (reasonably) simple way to mass-add statements to Wikidata is this tool: http://tools.wmflabs.org/wikidata-todo/quick_statements.php A combination of spreadsheet apps, shell commands, and/or a good text editor should allow you to convert many CSVs into the tool's input format. Cheers, Magnus On Tue, Aug 5, 2014 at 3:01 PM, Brylie Christopher Oxley bry...@gnumedia.org wrote: I would like to contribute data to Wikidata that is in the form of CSV files, geospatial shapefiles, etc. Is there currently, or planned, functionality to store general structured data on Wikidata? -- Brylie Christopher Oxley http://gnumedia.org ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Reasonator ignores of qualifier
So far as data types go I'd look at the structure here http://www.freebase.com/business/employment_tenure?schema= Something parallel to this satisfies the major requirements for describing who was the Mayor of Where When; perhaps the Mayor of New York is particularly notable, but sum total of significance of all mayors surely is greater and enough to be notable. Of course an uncountable number of composite concepts that people might want to reference that can be derived from a generic instance. For instance, Economy of Japan might be a good LCSH heading, but even the LCSH creates headings like that in a faceted organization that recognizes that there is an Economy of [place] for any [place]. If all of the useful composite concepts were materialized, you could puff Wikidata up by orders of magnitudes. ᐧ On Tue, Jun 17, 2014 at 11:06 AM, Tom Morris tfmor...@gmail.com wrote: Sad to see the Deletionists taking hold on Wikidata too. Tom On Mon, Jun 16, 2014 at 4:31 PM, Thomas Douillard thomas.douill...@gmail.com wrote: Yeah, there seem to be some cognitive dissonance going on here, it's weird. 2014-06-16 22:08 GMT+02:00 Derric Atzrott datzr...@alizeepathology.com: That's certainly what the policy says. It's not what some admins accept, though. A direct quote from one, from as recently as March this year: * The general spirit of the notability policy is that Wikipedia finds [the subject] notable This was also the general vibe that I had gotten that informed my understanding of notability on Wikidata before someone pointed out that policy actually says differently. Thank you, Derric Atzrott ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] What is the point of labels?
There's another road to ontology of labels which is connected with the kind of roles that labels play in systems. One need is that a system wants to mention something or draw something and otherwise refer to something and it needs to know what to call it. Another need is that you have a phrase and you want to find things with a matching label. Then there's the more general problem that the user has something in his head and you want to specify it. In terms of acceptance of labels you want the system to accept a wide range of possible names people would use for something (I think in Wikidata scope) but to make the most of that you need a good estimator of the probability that a particular surface form used in a particular context refers to this or that and that is probably out of scope. You want to accept labels you wouldn't want to generate. A tendency to generate ethnic, racial and other kinds of slurs is a showstopper for any public commercial application. A.I.'s are like people; some of them are more prone to potty mouth than others, you can't count on good behavior unless you train your animals. Thus, offensive labels should be tagged. Similar choices appear in different contexts. I live in New York and if you look at legal documents they always say New York State or New York City but if you drive onto the Thruway from Pennsylvania you will see Welcome to New York and then a distance sign that says New York is 490 miles away. Sometimes you want the latin name of an organism and sometimes you want the common name. You might want to speak of pharmaceuticals always using the generic name (Omeprazole) rather than a brand (Prilosec). Sometimes you want to use abbreviations (RDF) and other times you want to spell things out (Resource Description Framework). If you want to make something visually tight you need to control label length http://carpictures.cc/cars/photo/ A superhuman system would certainly contain statistical models, but a lot of the knowledge needed to do the above could be encoded as properties of the labels. ᐧ On Fri, Jun 6, 2014 at 1:57 PM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, In a different conversation it was put like this: Wikipedia is what it is and Wikidata is what it is. This was in the context of assumptions. Thanks, GerardM On 6 June 2014 16:59, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Am 06.06.2014 15:44, schrieb Gerard Meijssen: Hoi, That is exactly the point. Once you assume that they are the same you ignore the extend to which they are not. Many, many items have articles pointing to items resulting in labels that are not exactly the same subject. And these are mistakes that should be fixed. So? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Classification of properties?
I think the best scheme I've seen like this yet is at https://developers.google.com/freebase/v1/search-metaschema In the RDF model it is easy to make statements about predicates ?p a :SocialRelation . the key is that it is multi-dimensional so probably a given predicate will be a member of a few different categories scattered across the system. On Mon, May 5, 2014 at 8:18 AM, Fredo Erxleben fredo.erxle...@tu-dresden.de wrote: Hello all, I am playing around with properties at the moment, especially filtering out a certain kind of properties. So I wondered if it wouldn't be a nice thing, if properties were classified in some way. Example: (Numbering is just for readability and does not hold any semantics) P… is the placeholder for the actual property-Id, did not want to look them all up 1) Relations 1a) Mathematical relations 1b) Relations in human interaction 1b1) Social relation 1b1a) P… (is employee of) 1b1b) P… (is heir to) 1b2) Biological relation 1b2a) P… (brother of) 1b2a) P… (sister of) … 2) External IDs … and so on, I think you get the idea. Has something like this bees discussed before? Cheers, Fredo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] qLabel
I've been thinking about this kind of problem in my own systems. Name and link generation from entities is a cross-cutting concern that's best separated from other queries in your application. With SPARQL and multiple languages each with multiple rdf:label it is awkward to write queries that bring labels back with identifiers, particularly if you want to apply rules that amount if an ?lang label doesn't exist for a topic, show a label from a language that uses that uses the same alphabet as ?lang in preference to any others. Another issue too is that the design and business people might have some desire for certain kinds of labels and it's good to be able to change that without changing your queries. Anyway, a lot of people live on the other end of internet connections with 50ms, 2000ms or more latency to the network core, plus sometimes the network has a really bad day or even a bad few seconds. For every hundred or so TCP packets you send across the modern internet, you lose one. The fewer packets you send per interaction the less likely the user is going to experience this. If 20 names are looked up sequentially and somebody is on 3G cellular with 300ms latency, the user needs to wait six seconds for this data to load on top of the actual time moving the data and waiting for the server to get out of it's own way. This is using jQuery so it's very likely the page has other Javascript geegaws in that work OK for the developer who lives in Kansas City but ordinary folks in Peoria might not have the patience to wait until your page is fully loaded. Batch queries give users performance they can feel, even if they demand more of your server. In my system I am looking at having a name lookup server that is stupidly simple and looks up precomputed names in a key value store, everything really stripped down and efficient with no factors of two left on the floor. I'm looking at putting a pretty ordinary servlet that writes HTML in front of it, but a key thing is that the front of the back end runs queries in parallel to fight latency, which is the scourge of our times. (It's the difference between Github and Altassian) On Wed, Apr 2, 2014 at 4:36 AM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Hey Denny! Awesome tool! It's so awesome, we are already wondering about how to handle the load this may generate. As far as I can see, qlabel uses the wbgetentities API module. This has the advantage of allowing the labels for all relevant entities to be fetched with a single query, but it has the disadvantage of not being cacheable. If qlabel used the .../entity/Q12345.json URLs to get entity data, that would be covered by the web caches (squid/varnish). But it would mean one request per entity, and would also return the full entity data, not just the labels in one language. So, a lot more traffic. If this becomes big, we should probably offer a dedicated web interface for fetching labels of many entities in a given language, using nice, cacheable URLs. This would mean a new cache entry per language per combination of entities - potentially, a large number. However, the combination of entities requested is determiend by the page being localized - that is, all visitors of a given page in a given language would hit the same cache entry. That seems workable. Anyway, we are not there quite yet, just something to ponder :) -- daniel Am 01.04.2014 20:14, schrieb Denny Vrandečić: I just published qLabel, an Open Source jQuery plugin that allows to annotate HTML elements with Wikidata Q-IDs (or Freebase IDs, or, technically, with any other Semantic Web / Linked Data URI), and then grabs the labels and displays them in the selected language of the user. Put differently, it allows for the easy creation of multilingual structured websites. And it is one more way in which Wikidata data can be used, by anyone. Contributors and users are more than welcome! http://google-opensource.blogspot.com/2014/04/qlabel-multilingual-content-without.html ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] qLabel
Of course. You can cache the individual entities somewhere inside the server-system where they can be stuck together very quickly, or you can cache them on the client. On Wed, Apr 2, 2014 at 12:04 PM, Magnus Manske magnusman...@googlemail.com wrote: Could one of the front-ends (squid?) perform a simple batch service, by just concatenating the /entity/ JSON for requested items? That could effectively run on the cache and still deliver batches. On Wed, Apr 2, 2014 at 4:44 PM, Paul Houle ontolo...@gmail.com wrote: I've been thinking about this kind of problem in my own systems. Name and link generation from entities is a cross-cutting concern that's best separated from other queries in your application. With SPARQL and multiple languages each with multiple rdf:label it is awkward to write queries that bring labels back with identifiers, particularly if you want to apply rules that amount if an ?lang label doesn't exist for a topic, show a label from a language that uses that uses the same alphabet as ?lang in preference to any others. Another issue too is that the design and business people might have some desire for certain kinds of labels and it's good to be able to change that without changing your queries. Anyway, a lot of people live on the other end of internet connections with 50ms, 2000ms or more latency to the network core, plus sometimes the network has a really bad day or even a bad few seconds. For every hundred or so TCP packets you send across the modern internet, you lose one. The fewer packets you send per interaction the less likely the user is going to experience this. If 20 names are looked up sequentially and somebody is on 3G cellular with 300ms latency, the user needs to wait six seconds for this data to load on top of the actual time moving the data and waiting for the server to get out of it's own way. This is using jQuery so it's very likely the page has other Javascript geegaws in that work OK for the developer who lives in Kansas City but ordinary folks in Peoria might not have the patience to wait until your page is fully loaded. Batch queries give users performance they can feel, even if they demand more of your server. In my system I am looking at having a name lookup server that is stupidly simple and looks up precomputed names in a key value store, everything really stripped down and efficient with no factors of two left on the floor. I'm looking at putting a pretty ordinary servlet that writes HTML in front of it, but a key thing is that the front of the back end runs queries in parallel to fight latency, which is the scourge of our times. (It's the difference between Github and Altassian) On Wed, Apr 2, 2014 at 4:36 AM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Hey Denny! Awesome tool! It's so awesome, we are already wondering about how to handle the load this may generate. As far as I can see, qlabel uses the wbgetentities API module. This has the advantage of allowing the labels for all relevant entities to be fetched with a single query, but it has the disadvantage of not being cacheable. If qlabel used the .../entity/Q12345.json URLs to get entity data, that would be covered by the web caches (squid/varnish). But it would mean one request per entity, and would also return the full entity data, not just the labels in one language. So, a lot more traffic. If this becomes big, we should probably offer a dedicated web interface for fetching labels of many entities in a given language, using nice, cacheable URLs. This would mean a new cache entry per language per combination of entities - potentially, a large number. However, the combination of entities requested is determiend by the page being localized - that is, all visitors of a given page in a given language would hit the same cache entry. That seems workable. Anyway, we are not there quite yet, just something to ponder :) -- daniel Am 01.04.2014 20:14, schrieb Denny Vrandečić: I just published qLabel, an Open Source jQuery plugin that allows to annotate HTML elements with Wikidata Q-IDs (or Freebase IDs, or, technically, with any other Semantic Web / Linked Data URI), and then grabs the labels and displays them in the selected language of the user. Put differently, it allows for the easy creation of multilingual structured websites. And it is one more way in which Wikidata data can be used, by anyone. Contributors and users are more than welcome! http://google-opensource.blogspot.com/2014/04/qlabel-multilingual-content-without.html ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V