Re: [Wikidata-l] discussion on English Wikipedia about getting rid of citation templates...
Obviously I cannot speak for the development team, but my observation has been that the development schedule for Wikidata is rather malleable. I can't be sure because I don't remember having ever seen a formal order of development/deployment priorities, but I believe that things have been bumped up and down in the priorities order based on community request. If the ability to access Wikidata data directly - like this task would require - is something that has broad support, I would certainly encourage you to gather that support in one place and present it to the Wikidata community and the development team. Much like every Wikipedia article exists because someone decided that the subject matter was important enough to warrant them writing article on it, every feature (and piece of information) on Wikidata is there because someone decided that it was important enough to add. Sven On Mar 17, 2014 4:05 PM, David Cuenca dacu...@gmail.com wrote: Hi all, Some time ago there was an RFC on Wikidata about supporting Wikipedia sources [1]. The outcome was positive, the only thing blocking any further advance is, as Thomas pointed out, that the arbitrary access of items is still not available [2]. About the cite pmid templates, I think it doesn't matter much from the Wikidata POV. There will be an item representing each source and it will contain all associated external identifiers (doi, pmid, etc), it will not matter which one you use to find it. Cheers, Micru [1] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Source_items_and_supporting_Wikipedia_sources#Supporting_Wikipedia_sources [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=47930 On Mon, Mar 17, 2014 at 7:48 PM, edgar.hagenbich...@hagenbichler.atwrote: Hello Lane, yes, I am interested in joining a discussion about citation structure and use of it in wikidata, but I am not the right person who can say something about the future of citations and Wikidata (except my wishes on this topic). This weekend I tried to make a citation/source for the point in time of blindness of Johann Sebastian Bach on Wikidata. The point in time can be seen then on the Reasonator (http://tools.wmflabs.org/reasonator/?q=Q1339), the source is not displayed there but can be found in https://www.wikidata.org/wiki/Q1339 (statement - medical condition - blindness - 1750 - source - stated in The Eyes of Johann Sebastian Bach https://www.wikidata.org/wiki/Q15947415 ). The author has to be an own wikidata item: Richard H.C. Zegers (https://www.wikidata.org/wiki/Q15948328). The citation I made in Wikidata does exist also in the German Wikipedia about Starstich https://de.wikipedia.org/wiki/Starstich. So the next logical step would be, that this citation could be used in the German Wikipedia with linking it to Wikidata (and in every other wikipedia, etc. of course). If this is already possible, and how - I don´t know, but I am interested in it and I would like to join the discussion. Sincerely yours, Edgar Lane Rasberry schrieb: Hello. Is there anyone here who would like to join a discussion on English Wikipedia about citation structure? Some of us at WikiProject Medicine would like to meet anyone in the Wikidata community who could say something about the future of citations and Wikidata. At WikiProject Medicine we coordinate translation and reuse a lot of citations, and also do more review of sources than most other Wikimedia communities. Because of this, we are talking about deprecating template:cite PMID, template cite doi, and by extension Citation bot. While the problems people are experiencing are serious, I had the idea that Wikidata would eventually address a lot of citation problems but I do not know when that would be. If no one has plans then we at WikiProject Medicine will work for a quick solution now, but if there are plans, we would like to hear thoughts from anyone thinking about this. https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine# Replace_.22cite_pmid.22_with_.22cite_journal.22 Thanks to anyone who can comment, even if just to say that you know nothing about this and know no one working on Wikidata citations. We just wanted to seek feedback before we made great changes. yours, -- Lane Rasberry user:bluerasberry on Wikipedia 206.801.0814 l...@bluerasberry.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] weekly summary #94
Why? I don't see a benefit to that. Sven On Jan 25, 2014 10:38 AM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Hi Lydia, These updates are a lot like a blog. Can it be a real blog? WordPress should be fairly easy to set up :) -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014/1/25 Lydia Pintscher lydia.pintsc...@wikimedia.de Hey folks :) Here's what's been going on around Wikidata this week: https://meta.wikimedia.org/wiki/Wikidata/Status_updates/2014_01_24 Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] New stuff! (ordering, ranks and a table of content)
I proposed an up arrow, a square diamond, and a down arrow, all from the same Unicode set, in a mockup I sent to Lydia. I still think that those are a better idea, and not just because it was my idea. Sven On Dec 11, 2013 10:49 AM, Andrea Zanni zanni.andre...@gmail.com wrote: On Wed, Dec 11, 2013 at 4:31 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Ranks: * The three squares on the left are not really self explaining Do you have a better suggestion for the symbol? How about numbers? (eg. 1, 2, 3) or the symbol #? Maybe they are clearer. Aubrey ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Embedding voice samples in Wikidata
There seems to have been a misunderstanding on my part, for which I apologize. When I read this the first time I thought that you were stitching together audio clips specifically for voice identification. Audio clips for voice identification, at least in my experience, tend to just be a collection of syllables, as that is what is basically needed to do voice identification. If you are talking about substantive quotes, with your samples seem to be indicating you are, then what I was worried about and what you intend to do are very different things. I will retract the concerns that I laid out in the previous email, as they appear to be unfounded. Apologies again, Sven On Nov 15, 2013 5:17 PM, Andrew Gray andrew.g...@dunelm.org.uk wrote: On 15 November 2013 07:54, Sven Manguard svenmangu...@gmail.com wrote: This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes. For Wikipedia, if it's just a voice sample - as opposed to a notable quote - the community is going to view it as cruft and remove it from articles, as the majority of users will find a contextless sound clip to be of little encyclopedic value. For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license). I like the idea, don't get me wrong. I just think that the broader community is not going to see the utility in the samples. I think that audio clips - as supplementary material - do have definite value; undoubtedly they're of less value than a photograph, but they're probably more useful than a signature, which seems to be fairly well accepted (on enwiki at least). Beats me as to why... Audio clips of major quotes (or whole speeches, etc) are definitely more value than more mundane ones, in the way that a picture of historic significance is better than a conventional portrait, but I wouldn't agree that they're automatically contextless just because you don't already know what they're saying. Of the three samples given there, we have: * Mary Robinson talking about her upbringing * Mark Carney discussing economic policy * Justin Welby on ethics banking The general approach of the BBC material makes it likely that most of the clips will be people discussing themselves, their work, or their field of expertise, all of which seem contextually appropriate. Thirdly, whether Wikipedia wants it or not this is definitely useful and appropriate material for Commons, and if Commons has a distinctive class of items attached to subjects then it seems reasonable to note that on Wikidata. Again, signatures are a good example - https://www.wikidata.org/wiki/Property:P109 - but there's also things like https://www.wikidata.org/wiki/Property:P94 (coat of arms image) The fact that we've got external reusers doing something cool (matching Wikidata entities by voice recognition!) is the icing on the cake ;-) -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] next sister project: Wikisource
You bring up a good point. Is there anyway to have the interwiki links that show up on the sidebar point to a page with a different q# ID then the should? If we can do that we can have every version of a given book all point to a disambiguation page that lists all of the versions. I can't think of any other solution. Sven On Nov 8, 2013 12:49 PM, Joe Filceolaire filceola...@gmail.com wrote: Actually the problem isn't that you can only have one link from a wikisource work from a wikidata item. We have separate wikidata items for each edition of a work (because these have different metadata) so multiple editions of the same work on a wikisource link to different wikidata items. This creates a different problem. Each language edition of a work is a different edition so it links to a different wikidata item which has sitelinks only to that translation of the work. This means you can't use sitelinks to link to translations of a work on other wikisources. Does this mean wikidata sitelinks are useless for wikisource? filceolaire ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Proposed change to the inclusion synthax
Excuse me in advance, as this isn't an area I am well versed in, but: {{#property:P36}} doesn't have to change if the value changes, but an edit would have to be made to modify {{#property:P36|value=Berlin}} if the value changed. Now capitals don't change, but if we have a value for Alexa rank or annual GDP or population, well those change often. Part of the utility of Wikidata is that on smaller projects once you set everything up you don't need a continuous stream of edits. Sven On Oct 20, 2013 3:06 PM, Vito vituzzu.w...@gmail.com wrote: Il 20/10/2013 20:30, David Cuenca ha scritto: Hi, I would like to know your opinion about having the value in the #property parser function. Right now we have two options: {{#property:P36}} {{#property:capital}} The problem with this model is that editors in wikipedia cannot see the value unless they render the page (or possibly use the VisualEditor). Having the value in the property parser itself would allow contributors to see and edit the value in Wikipedia, which would in turn update the Wikidata value. Of course updating the value in Wikidata will also update the text field. It could look like {{#property:P36|value=Berlin}} {{#property:capital|value=**Berlin}} Or maybe: {{#property:P36=Berlin}} {{#property:capital=Berlin}} What is your opinion about it? Cheers, Micru Then what will be data's scope? :/ Vito __**_ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Countries ranked
This is cool, but do we have any statistics on the number of pages have coordinate locations versus the number of pages that should have coordinated locations? This statistic is more a reflection of where the bots have been running and where they haven't been run yet. I'd be very interested in seeing the top 10 once we've imported all the coords we can. Sven On Oct 4, 2013 5:30 PM, Katie Filbert filbe...@gmail.com wrote: As many folks enjoy country rankings, I have generated a list of countries (Property:P17) ranked by number of coordinates (P625) in Wikidata. Note this data is from the September 22 database dump. There are a total of 737,271 coordinates in Wikidata. Top countries are 1) US 2) Russia 3) UK 4) China 5) France 6) Ukraine 7) Canada 8) Germany 9) Australia 10) Poland See the full list (which also has a few items entered for P17 that are not really countries): https://www.wikidata.org/wiki/User:Aude/countrystats Cheers, Katie (user:aude) -- Katie Filbert filbe...@gmail.com @filbertkm / @wikimediadc / @wikidata ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Oversight nomination
This is a notice to inform the community that I have nominated myself for Oversight on Wikidata. The request can be found at [1]. Yours, Sven Manguard [1] https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Oversight#Sven_Manguard_2 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Oversight nomination
;p is a winking smiley face. I don't think anyone thought he was being serious. S On Sep 16, 2013 10:00 AM, Leon Liesener leon.liese...@wikipedia.de wrote: No, that's just the compliance of the global Oversight policy ( https://meta.wikimedia.org/wiki/Oversight_policy#Access — The candidates must request it within the local community and advertise this request to the local community properly (community discussion page, mailing list, etc)). Regards, Leon Am 16.09.2013 um 12:33 schrieb Vito vituzzu.w...@gmail.com: Il 16/09/2013 09:35, strynwiki ha scritto: Hi all, I've nominated myself as an oversighter at https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Oversight/Stryn_2 Regards, Stryn ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l That's canvassing ;p Vito ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Automatic summaries?
This has the potential to work, but we need to be careful that the descriptions don't only partially represent their subjects. This is especially difficult with humans, as they are often known for several things, and occasionally (but in a statistically significant number, I would think), known for things that don't fit cleanly into a [nationality] [career], born [birth year] formula. As it exists now, the Wikidata item on the Ft. Hood shooter, Nidal Hasan [1], gives his military branch and rank, his location and place of birth, his gender, and a Commons category. From that, a bot summary would likely be American Army major, born 1970. There would be no indication of his source of notability, the shooting. What I would recommend is that we start with inanimate objects and get our bearings on bot-generated descriptions there (celestial objects, video games, buildings), then move onto the slightly more complicated to define non-human living things (species of plant, species of animal, species of creepy-crawly) and geographic locations (rivers, villages/towns/cities, mountain ranges), and then finally onto humans. Some things to think about: How do you create a description for a battleship that saw service with several different navies or a river that runs through several different countries? How do you create a description for a country that does not exist anymore or a location that has been destroyed? How do you create a description for a fictional person, item, place, etc., when Wikidata does not currently have an effective way of denoting that something is fictional? It might make sense to use Wikipedia categories to augment the Wikidata statements. I think that we should build a few formulas that are... difficult to screw up. Video games come to mind, because the formula [year of first publication] [genre] video game is really all you need, and other than that some games have multiple genres, there's no way to get the description wrong. Once the people with coding knowledge figure out what they want to do implementation wise, I'll be happy to work with the formulas. [1] http://www.wikidata.org/wiki/Q1400551#sitelinks-wikipedia On Sat, Sep 7, 2013 at 7:12 AM, Luca Martinelli martinellil...@gmail.comwrote: 2013/9/7 Magnus Manske magnusman...@googlemail.com: I believe that, for items that have basic claims/statements, short descriptions can be generated automatically, for supported languages. If we have person, Belgian, painter, and birth/death year, a sentence like Belgian painter (1900-2000) can be constructed. Some awards (Nobel prize, Victoria cross, etc.) could be added. +1 on the idea. Not sure about the birth/death year, though. -- Luca Sannita Martinelli http://it.wikipedia.org/wiki/Utente:Sannita ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Automatic summaries?
I agree with all four of those points. As your question, we do not have that type of property yet, and although it might be slightly controversial, I would certainly support it. We would however need monolingual text as a property type before that could happen. Personally I see supporting web addresses as being much more critical on the list of properties for development, is that would dramatically open up our ability to source data. That being said, I really haven't been keeping up with the development schedule, so I have no idea what's in the pileline and in what order. S On Sep 7, 2013 1:44 PM, Magnus Manske magnusman...@googlemail.com wrote: All valid points, Sven. I would just like to say that * this is not intended as a replacement or auto-fill for descriptions; it is to be shown if the manual description is blank (at least, that was my angle) * unusual items, like your example, will likely have a manual desription; the run-of-the-mill millitary person will not * for many uses, even an imperfect or (through omission) somewhat misleading description is better than none * as in your example, a misrepresentation is first and foremost due to the incompleteness of Wikidata and the properties it offers The last one reminds me: is there a reason for notability property? In your example item, the Ft. Hood shootings could be added that way, and then also show up in the description (notable for Ft. Hood shooting). On Sat, Sep 7, 2013 at 6:32 PM, Sven Manguard svenmangu...@gmail.comwrote: This has the potential to work, but we need to be careful that the descriptions don't only partially represent their subjects. This is especially difficult with humans, as they are often known for several things, and occasionally (but in a statistically significant number, I would think), known for things that don't fit cleanly into a [nationality] [career], born [birth year] formula. As it exists now, the Wikidata item on the Ft. Hood shooter, Nidal Hasan [1], gives his military branch and rank, his location and place of birth, his gender, and a Commons category. From that, a bot summary would likely be American Army major, born 1970. There would be no indication of his source of notability, the shooting. What I would recommend is that we start with inanimate objects and get our bearings on bot-generated descriptions there (celestial objects, video games, buildings), then move onto the slightly more complicated to define non-human living things (species of plant, species of animal, species of creepy-crawly) and geographic locations (rivers, villages/towns/cities, mountain ranges), and then finally onto humans. Some things to think about: How do you create a description for a battleship that saw service with several different navies or a river that runs through several different countries? How do you create a description for a country that does not exist anymore or a location that has been destroyed? How do you create a description for a fictional person, item, place, etc., when Wikidata does not currently have an effective way of denoting that something is fictional? It might make sense to use Wikipedia categories to augment the Wikidata statements. I think that we should build a few formulas that are... difficult to screw up. Video games come to mind, because the formula [year of first publication] [genre] video game is really all you need, and other than that some games have multiple genres, there's no way to get the description wrong. Once the people with coding knowledge figure out what they want to do implementation wise, I'll be happy to work with the formulas. [1] http://www.wikidata.org/wiki/Q1400551#sitelinks-wikipedia On Sat, Sep 7, 2013 at 7:12 AM, Luca Martinelli martinellil...@gmail.com wrote: 2013/9/7 Magnus Manske magnusman...@googlemail.com: I believe that, for items that have basic claims/statements, short descriptions can be generated automatically, for supported languages. If we have person, Belgian, painter, and birth/death year, a sentence like Belgian painter (1900-2000) can be constructed. Some awards (Nobel prize, Victoria cross, etc.) could be added. +1 on the idea. Not sure about the birth/death year, though. -- Luca Sannita Martinelli http://it.wikipedia.org/wiki/Utente:Sannita ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- undefined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org
Re: [Wikidata-l] Oversight nomination
Are there any publicly available statistics about the number of Oversight requests [successful and not] tjat jave happened on Wikidata thus far? S On Sep 1, 2013 11:53 PM, Adrian Raddatz ajradd...@gmail.com wrote: Hi all, just a heads-up that I've nominated myself for oversight rights at https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Oversight#Ajraddatz . Regards, -- Adrian Raddatz ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Weekly Summary #70
It would appear that there is more negative feedback than positive on the logo change... On Aug 9, 2013 10:18 AM, adam.shorl...@wikimedia.de adam.shorl...@wikimedia.de wrote: Wikimania Continues! (I hope you like our current Hong Kong logo!) Make sure you come and say hi to use if you are attending! Checkout this weeks summary! http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_08_09 Have a great weekend! Adam ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] A solution with finality is needed for P107 - maintype (GND)
We really need to keep everything in one forum. Can you two please copy your comments to https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Primary_sorting_propertyand continue the discussion there? I worry that the person closing the discussion might not be on the mailing list and might not see your points. S On Jul 1, 2013 12:16 PM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi, The reason why it is NOT good enough and what you fail to understand is that this is NOT an attribute that we should morph into something else. Its name makes it clear: main type (GND) this implies that the definition and its values are external to Wikidata; they are the definition as per the GND. For me it means that when a specific value of this main type makes sense... ie it is about a person, I use it. I do not use it for any other value. The added value for using it is some of the tools that INSIST on its use. From a theoretical point of view, instance of serves us equally well without relying on external values and systems. The reason why I proposed the removal of p107 is that people give it a value that they do not support by providing arguments and guidance on how to ensure that data entered is valid. So far I have noticed that Wikidata is seen as secondary to whatever Wikipedia. In my opinion Wikidata is a project in its own right and many artefacts of Wikipedia just do not belong in Wikidata. P107 is one such artefact. Thanks, GerardM On 1 July 2013 17:03, Paul A. Houle p...@ontology2.com wrote: I would say that GND is a “good enough” answer. Most named entities are persons, organizations, events, creative works and places and these are all mutually exclusive. There ought to be a system interlock to prevent confusion between them. “Organism Classification” or whatever you call it should also be on the list, because of prevalence. One thing I’d add to that is fictional character because there are a (1) lot of them and (2) they can be ontologized more-or-less in parallel with people, and (3) you’ll get cleaner people if you keep fictional characters out. (On the other hand, there are fictional events, places, etc. too, though these are not so well documented.) Is it easy to add a new GND type? I think you’re calling the “wastebin” category term, which is reasonable (I’d call it a “concept”.) Going much further than this you’ll run into Borges encyclopedia style risks, but aren’t the categories named in GND upwards of 80% of the topics? Can you run a report on this? *From:* Sven Manguard svenmangu...@gmail.com *Sent:* Sunday, June 30, 2013 2:19 PM *To:* Discussion list for the Wikidata project.wikidata-l@lists.wikimedia.org *Subject:* [Wikidata-l] A solution with finality is needed for P107 - maintype (GND) I have just closed a second deletion discussion for Property:P107 - main type (GND). As with the first discussion, it is clear that there is a broad sense that main type (GND) is not an ideal solution, however as it stands now, a large enough portion of the community does not want to get rid of it unless/until a replacement system is found or developed. For this reason, I closed the discussion as no consensus and opened up a request for comment on the matter of finding a replacement for P107. I have gone to the unusual step of emailing the mailing list for three reasons. First, P107 is the most used property on the project, and it or its replacement will (most likely) remain the most used property on the project forever. Second, the GND has evolved into a component of how Wikidata is structured; our lists of properties are sorted by GND type, and that has a real impact on what properties are used on what pages. The third reason is that, as a general statement, participation levels in requests for comment have been downright sad. Three or four people participating in an RfC is, for a project of this size, unhealthy, and most RfCs don't get more than that many people participating in them. For something this important, we need at least a dozen people, preferably at least twice that. /rant Anyways, the RfC is at https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Primary_sorting_propertyand I hope that, with broad participation, we can finally resolve this issue. Yours, Sven -- ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list
[Wikidata-l] A solution with finality is needed for P107 - main type (GND)
I have just closed a second deletion discussion for Property:P107 - main type (GND). As with the first discussion, it is clear that there is a broad sense that main type (GND) is not an ideal solution, however as it stands now, a large enough portion of the community does not want to get rid of it unless/until a replacement system is found or developed. For this reason, I closed the discussion as no consensus and opened up a request for comment on the matter of finding a replacement for P107. I have gone to the unusual step of emailing the mailing list for three reasons. First, P107 is the most used property on the project, and it or its replacement will (most likely) remain the most used property on the project forever. Second, the GND has evolved into a component of how Wikidata is structured; our lists of properties are sorted by GND type, and that has a real impact on what properties are used on what pages. The third reason is that, as a general statement, participation levels in requests for comment have been downright sad. Three or four people participating in an RfC is, for a project of this size, unhealthy, and most RfCs don't get more than that many people participating in them. For something this important, we need at least a dozen people, preferably at least twice that. /rant Anyways, the RfC is at https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Primary_sorting_propertyand I hope that, with broad participation, we can finally resolve this issue. Yours, Sven ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wikidata-tech] WikiVOYAGE deployment plan
Admittedly I have been crazy busy with things unrelated to Wikimedia projects, so I haven't followed this discussion, but I'd like to ask for a clarification on Wikivoyage interwiki links. If they're going to be on the same item page as the Wikipedia interwiki links, is there going to be a dedicated section for WV links separate (and clearly labeled) from the existing Wikipedia interwiki links, or are both sets of links going to be in the same list. i.e. are we going to have Wikipedia en: New York City es: New York City it: New York City Wikivoyage en: New York City es: New York City it: New York City or are we going to have wp:en: New York City wv:en: New York City wp:es: New York City wv:es: New York City wp:it: New York City wv:it: New York City I really, really hope it's two separate lists. Otherwise things are going to become unnecessarily complicated/difficult. Cheers, S On Fri, Jun 28, 2013 at 5:45 PM, legoktm legoktm.wikipe...@gmail.comwrote: Hi Denny, I'm really excited to see a sister project getting included, however I'm concerned that the community needs a bit more time and notice (I didn't see anything about this on WD:PC). When importing interwiki links for Wikipedias, we had a few months before they were used on client sites. The proposed schedule gives our bots ~3 days to import a majority of links, which I don't think is enough time. A whole week would be much better in my opinion. I've also started a page on-wiki to help coordinate the migration: https://www.wikidata.org/wiki/Wikidata:Wikivoyage_migration -- Legoktm On Fri, Jun 28, 2013 at 2:45 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Sorry, I had a typo in the title of my last Email. It should be Wikivoyage obviously, not Wikileaks or Wikisomethingelse. Cheers, Denny 2013/6/28 Denny Vrandečić denny.vrande...@wikimedia.de Hey all, as discussed yesterday in the call, here is our current plan for deploying interwikilinks to Wikivoyage. If there are no complaints from your side by Tuesday, we will share this plan with the Wikivoyage communities and the Wikidata community on Wednesday. Wed, July 17th: Branching Wikibase 1.22-wmf12 Thu, July 18th: Deploying wmf12 to the test systems and setting up configurations for Wikivoyage on test. This means, the Test Wikidata can accept links to Wikivoyage sites. Mon, July 22nd: Deploying wmf12 to wikidata.org and setting the configuration to accept Wikivoyage links as well. Thu, July, 25th: Deploying wmf12 client to all Wikivoyage.org language editions. From this moment on, Wikivoyage can access interwiki links from Wikidata, and does not need to have them locally anymore. Notes: * Wikivoyage will only get access to the interwikis for now, not to other data in Wikidata. This is planned for later, but we just want to go step by step (i.e. only phase 1) * Wikipedia will not automatically and suddenly display links to Wikivoyage. The behavior on Wikipedia actually remains completely unchanged by this deployment. * Wikivoyage will not automatically get links to Wikipedia and display them (currently called Related sites). This is also left for later. * Further sister projects are planned for later, depending how smoothly this deployment goes. * There is no need for an additional item for e.g. New York for Wikivoyage, but rather the links to Wikivoyage can be entered in the same item that also holds the links to Wikipedia. Cheers, Denny P.S.: Ken, you might consider joining the Wikidata tech list. This is where we send the agenda for the Thursday calls around. -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-tech mailing list wikidata-t...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Rate of new changes in WikiData
There are two answers that question; are you looking for the edit rate for edits done manually only, or are you looking for the edit rate for both edits done manually and edits done by scripts and automated programs? S On Jun 23, 2013 9:48 PM, Hady elsahar hadyelsa...@gmail.com wrote: Hello All , i was wondering if someone knows roughly the Rate of Wikidata changes per minute or even per day , i tried to watch the Feed for a while but it varies a lot what would be the maximum and minimum rate , should we expect it also to increase as a result of more contributions ? i'm taking about updates posted in the RSS feed herehttp://www.wikidata.org/w/index.php?title=Special:RecentChangesfeed=atom thanks Regards - Hady El-Sahar Research Assistant Center of Informatics Sciences | Nile Universityhttp://nileuniversity.edu.eg/ email : hadyelsa...@gmail.com Phone : +2-01220887311 http://hadyelsahar.me/ http://www.linkedin.com/in/hadyelsahar ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Some Wiktionary data in Wikidata
Did you mean to say We do *not* need another Wikidata? Otherwise I am confused by your comment. On Jun 21, 2013 12:08 PM, Jan Dudík jan.du...@gmail.com wrote: Ww do need another wikidata, only separate namespace for items (words) and some separate properties JAnD 2013/6/21 Gerard Meijssen gerard.meijs...@gmail.com: We do not need another Wikidata for Wiktionary Thanks, GerarM ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Geoccordinates are live
I've been doing some manual importing, and have found that in the vast majority of cases, when different languages' Wikipedias have different coordinates for a location, the coordinates that are most accurate are the ones in the language that is spoken where the location is. For example, some of the first locations to have their coordinates brought over to Wikidata were some train stations in the Netherlands. The Dutch and English Wikipedias' coordinates differed, and the Dutch coordinates were right every time. One English Wikipedia coordinate was even for the next station down the line, over an arcminute away. Therefore I think that we should coordinates for locations in Germany from dewiki, locations in Spain from eswiki, locations in the Netherlands from nlwiki, etc. Sven On Wed, Jun 12, 2013 at 5:36 AM, Cristian Consonni kikkocrist...@gmail.comwrote: 2013/6/12 Kolossos tim.al...@s2002.tu-chemnitz.de: Hey, the question is now how we can merge coordinates from all languages to Wikidata. I would propose to use the coordinate from the longest article to have a good chance for using the most accurate one. Thats the way I use in Wikipedia-World[1]. After an update we could also use this database for an import. Worst case would be that everyone use a bot and we would have a great bot-war. I think it should be possible to just import them as data with different sources. If a coordinate pair is the same over multiple Wikipedia then you have more sources, see for example the property occupation:politician here[1] Ciao, Cristian [1] http://www.wikidata.org/wiki/Q76 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] gene templates - wikidata, when?
Sorry to be the contrarian, but I'm not sure we should be talking about pulling data from Wikidata into Wikipedia until the devs announce that they are close to deploying it. It makes no sense to build an infrastructure now if the assumptions about functionality and API that you're basing the infrastructure on aren't officially locked in yet. Sven On Fri, Mar 1, 2013 at 5:12 AM, Bináris wikipo...@gmail.com wrote: 2013/3/1 Magnus Manske magnusman...@googlemail.com There will soon be a mechanism where Wikipedia can display data from Wikidata directly, as it currently does with the language links. No need to bot-edit Wikipedia. Yes, yes, that's what I speak about. :-) As phase 2 is working now in Wikidata repo, this is the good time to speak about this. -- Bináris ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Complaint about the partial Phase II deployment
Hello there. I have been an active and vocal supporter of Wikidata since almost the day it went live, and after giving Phase II a legitimate chance, I have to say that in my opinion the decision to deploy Phase II with only a small number of the expected features has been a massive mistake. Yes, I understand that the project was losing momentum and that several people commented that they felt that there was nothing to do on the project before Phase II hit, however the partial release has caused considerable confusion, and worse, has caused people to make decisions *based on what is available now* as opposed to based on *what would be the best choice in the long term*. It would have been one thing if Phase II were released with 80% of its projected features and an official list from the developers of the things that were left out. Instead we got what I have to guess is around 10% of the projected features, and if there's an official list of things that are missing or a timeline of when they're going to appear, I haven't seen it. I also have to question the timing of the release, bringing Phase II live just before Wikidata hits English Wikipedia. Was this done on purpose to try and bring over some of the Wikipedia editors? If not, the timing is awful. Nothing of this scale and level of technical sophistication ever gets deployed to English Wikipedia smoothly, and I think that the near future is going to show that the English Wikipedia deployment is going to be competing with the Phase II rollout for the time of the coders, who will need to fix bugs in both areas. I'm sorry for being so pessimistic, but I really do feel let down by this release. It's like being told that you're going to watch a feature film and then only getting the official trailer. The trailer is good, but it's not what people were expecting and it's not particularly valuable on its own. I look forward to any response that the Wikidata staff or the community might have to this. Sven ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wikimedia Announcements] Wikimedia sites to move to primary data center in Ashburn, Virginia. Disruption expected.
Thank you for this. I realize the absurdity of replying to all of these lists, but I feel that my question is important enough to warrant it. For most of the history of Wikipedia and the other projects, we have been bound by United States and Florida state laws, because of the location of the servers. With our move to a different state, what, if any, legal issues do we need to be aware of? To those of you that are not American, the United States puts a great deal of lawmaking power in the hands of the states, enough so that there is often significant variance between states' legal codes, especially on civil issues (which includes privacy issues, copyright, and most legal actions). Thank you again, Sven On Sat, Jan 19, 2013 at 1:49 PM, Guillaume Paumier guillom@gmail.comwrote: [Apologies for cross-posting; this concerns all Wikimedia projects] Posted today on the Wikimedia Tech Blog: https://blog.wikimedia.org/2013/01/19/wikimedia-sites-move-to-primary-data-center-in-ashburn-virginia/ Wikimedia sites to move to primary data center in Ashburn, Virginia Next week, the Wikimedia Foundation will transition its main technical operations to a new data center in Ashburn, Virginia, USA. This is intended to improve the technical performance and reliability of all Wikimedia sites, including Wikipedia. Engineering teams have been preparing for the migration to minimize inconvenience to our users, but major service disruption is still expected during the transition. Our sites will be in read-only mode for some time, and may be intermittently inaccessible. Users are advised to be patient during those interruptions, and share informationhttps://meta.wikimedia.org/wiki/Wikimedia_maintenance_noticein case of continued outage or loss of functionality. The current target windows for the migration are January 22nd, 23rd and 24th, 2013, from 17:00 to 01:00 UTC (see other timezoneshttp://www.timeanddate.com/worldclock/fixedtime.html?msg=Wikimedia+data+center+migrationiso=20130122T17ah=8on timeanddate.com). Wikimedia sites have been hosted in our main data center in Tampa, Florida, since 2004; before that, the couple of servers powering Wikipedia were in San Diego, California. Ashburn is the third and newest primary data center to host Wikimedia sites. A major reason for choosing Tampa, Florida as the location of the primary data center in 2004 was its proximity to founder Jimmy Wales’ home, at a time when he was much more involved in the technical operations of the site. In 2009, the Wikimedia Foundation’s Technical Operations team started to lookhttps://blog.wikimedia.org/2009/04/07/wmf-needs-additional-datacenter-space/for other locations with better network connectivity and more clement weather. Located in the Washington, D.C. metropolitan area, Ashburn offers faster and more reliable connectivity than Tampa, and usually fewer hurricanes. The Operations team started to plan and prepare for the Virginia data center in Summer 2010. The actual build-out and racking of servers at the colocation facility started in February 2011, and was followed by a long period of hardware, system and software configuration. Traffic started to be served to users from the Ashburn data center in November 2011, in the form of CSS and JavaScript assets (served from “bits.wikimedia.org“). We reached a major milestone in February 2012, when caching servers were set up to handle read-only requests for Wikipedia and Wikimedia content, which represent most of the traffic to Wikipedia and its sister sites. In April 2012, the Ashburn data center also started to serve media files (from “upload.wikimedia.org“). Cacheable requests represent about 90 percent of our traffic, leaving 10 percent that requires interaction with our web (Apache) and database (MySQL) servers, which are still being hosted in Tampa. Until now, every edit made to a Wikipedia page has been handled by the servers in Tampa. This dependency on our Tampa data center was responsible for the site outage in August 2012https://blog.wikimedia.org/2012/08/06/wikimedia-site-outage-6-august-2012/, when a fiber cut severed the connection between our two locations. Starting next week, the new servers in Ashburn will take on that role as well, and all our sites will be able to function fully without relying on the servers in Florida. The legacy data center in Tampa will continue to be maintained, and will serve as a secondary “hot failover” data center: servers will be in standby mode to take over, should the primary site experiences an outage. Server configuration and data will be synchronized between the two locations to ensure a transition as smooth as possible in case of technical difficulties in Ashburn. Besides just installing newer hardware, setting up the data center in Ashburn has also been an opportunity for architecture overhauls, like incremental improvements of the text storage
Re: [Wikidata-l] Data values
I really, really hope that this isn't the mindset of the development team as a whole. If so, my confidence in the viability of Wikidata would take a major hit. Yes, collecting the information that goes into infoboxes is going to be important, and yes, centralizing that information so that it can be used by all projects is a worthwhile initial goal. It's not the only thing this project is ever going to be used for though. To say that things that aren't currently in infoboxes aren't worth supporting is, quite frankly, a really awful artificial limit the usefulness of the project. First off, 'what is and is not a field in a Wikipedia infobox' is a metric that changes over time, and often in large or unpredictable ways. Entire infoboxes have been created and depreciated, to say nothing of individual fields in those infoboxes. Someone might come along tomorrow and say Yes, in fact, we should include the dimensions of historical buildings in their historical units or Yes, in fact, we should list both Old Style and New Style dates [1] where applicable. We're then going to be in the position where Wikidata doesn't have the information that people want to include. Had we allowed those things in from the beginning, and properly supported them, we might have had the information ready when it was asked for. If we only add new fields when people request them, those fields won't be ready until long after they're needed. But more importantly, Wikidata is eventually going to be used for things other than Wikipedia infoboxes. Those uses are going to happen both on Wikipedia and off, and some of those uses are impossible to envision now. We should focus on collecting as much useful data as we can; not just what's in an infobox today, but what might be in an infobox, or the body text of an article, tomorrow. Please don't sell out Wikidata's future utility for today's convenience. Sven [1] http://en.wikipedia.org/wiki/Old_Style_and_New_Style_dates On Thu, Dec 20, 2012 at 5:26 AM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: First off: our target use case is Wikipedia infoboxes. Do you have examples and numbers about the usage of such ancient units in infoboxes on wikipedia? If they are not in main stream use there, I don't see why Wikidata would have to support them. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
I don't think we can sensibly support historical units with unknown conversions, because they cannot be compared directly to SI units. So, they couldn't be used to answer queries, can't be converted for display, etc - they arn't units in any sense the software can understand. This is a solvable problem, but would add a tremendous amount of complexity. I get the feeling that I might be the only person on this thread that doesn't have a maths/sciences/computers background here. I'm going to be frank here: We need to snap out of the mindset that all of the data we're collecting is going to be easily expressible using modern scientific units and methodologies. If we try and cram everything into a small number of common units, without giving the users some method of expressing non-standard/uncommon/non-scientific values, we're going to have a massive database that is going to at best be cumbersome and at worst be useless for a great deal of information. Traditional Chinese units of measurement [1] have changed their actual value over time. A li in one century is not as long as it is in another century, and while there is a li to SI conversion, it's artificial and when we try to use the modern li to measure something, we get a different value for that thing than the historically documented li value states it should be. There is a balance. The more flexible the parameters, the easier it is to put data in, but the harder it is for computers to make useful connections with it. I'm not sure how to handle this, but I am sure that we can't just keep pretending that all of the data we're going to collect falls nicely into the metric system. Reality just doesn't work that way, and for Wikidata to be useful, we can't discount data that doesn't fit in the mold of modern units. Sven [1] http://en.wikipedia.org/wiki/Chinese_units_of_measurement ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
My philosophy is this: We should do whatever works best for Wikidata and Wikidata's needs. If people want to reuse our content, and the choices we've made make existing tools unworkable, they can build new tools themselves. We should not be clinging to what's been done already if it gets in the way of what will make Wikidata better. Everything that we make and do is open, including the software we're going to operate the database on. Every WMF project has done things differently from the standards of the time, and people have developed tools to use our content before. Wikidata will be no different in that regard. Sven On Wed, Dec 19, 2012 at 12:27 PM, Martynas Jusevičius marty...@graphity.org wrote: Denny, you're sidestepping the main issue here -- every sensible architecture should build on as much previous standards as possible, and build own custom solution only if a *very* compelling reason is found to do so instead of finding a compromise between the requirements and the standard. Wikidata seems to be constantly doing the opposite -- building a custom solution with whatever reason, or even without it. This drives the compatibility and reuse towards zero. This thread originally discussed datatypes for values such as numbers, dates and their intervals -- semantics for all of those are defined in XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/ All the XML and RDF tools are compatible with XSD, however I don't think there is even a single mention of it in this thread? What makes Wikidata so special that its datatypes cannot build on XSD? And this is only one of the issues, I've pointed out others earlier. Martynas graphity.org On Wed, Dec 19, 2012 at 5:58 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Martynas, could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that. Cheers, Denny 2012/12/19 Martynas Jusevičius marty...@graphity.org Hey wikidatians, occasionally checking threads in this list like the current one, I get a mixed feeling: on one hand, it is sad to see the efforts and resources waisted as Wikidata tries to reinvent RDF, and now also triplestore design as well as XSD datatypes. What's next, WikiQL instead of SPARQL? On the other hand, it feels reassuring as I was right to predict this: http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html Best, Martynas graphity.org On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: On 19.12.2012 14:34, Friedrich Röhrs wrote: Hi, Sorry for my ignorance, if this is common knowledge: What is the use case for sorting millions of different measures from different objects? Finding all cities with more than 10 inhabitants requires the database to look through all values for the property population (or even all properties with countable values, depending on implementation an query planning), compare each value with 10 and return those with a greater value. To speed this up, an index sorted by this value would be needed. For cars there could be entries by the manufacturer, by some car-testing magazine, etc. I don't see how this could be adequatly represented/sorted by a database only query. If this cannot be done adequatly on the database level, then it cannot be done efficiently, which means we will not allow it. So our task is to come up with an architecture that does allow this. (One way to allow scripted queries like this to run efficiently is to do this in a massively parallel way, using a map/reduce framework. But that's also not trivial, and would require a whole new server infrastructure). If however this is necessary, i still don't understand why it must affect the datavalue structure. If a index is necessary it could be done over a serialized representation of the value. Serialized can mean a lot of things, but an index on some data blob is only useful for exact matches, it can not be used for greater/lesser queries. We need to map our values to scalar data types the database can understand directly, and use for indexing. This needs to be done anyway, since the values are saved at a specific unit (which is just a wikidata item). To compare them on a database level they must all be saved at the same unit, or some sort of procedure must be used to compare them (or am i missing something again?). If they measure the same dimension, they should be saved using the same unit (probably the SI base unit for that dimension). Saving values using different units would make it impossible to run efficient queries against these values,
Re: [Wikidata-l] Data values
I think that Tom Morris tragically misunderstood my point, although that was likely helped by the fact that, as I've insinuated already, the many standards and acronyms being thrown about are largely lost on me. My point is not We can just throw everything out because we're big and awesome and have name brand power. My point was We're going to reach a point where some of the existing standards and tools just don't work because when they were built things like Wikidata weren't envisioned. We need to have the mindset that developing new pieces that work for us is better than trying to force a square peg into a round hole just because something is already widely used. If what exists doesn't work, we're going to do more harm than good if we have to start cutting corners or cutting features to try and get it to work. We have an infrestructure that would allow third parties to come along later and build tools that allow there to be a bridge between whatever we create and whatever exists already. Sven On Wed, Dec 19, 2012 at 2:40 PM, Tom Morris tfmor...@gmail.com wrote: Wow, what a long thread. I was just about to chime in to agree with Sven's point about units when he interjected his comment about blithely ignoring history, so I feel compelled to comment on that first. It's fine to ignore standards *for good reasons*, but doing it out of ignorance or gratuitously is just silly. Thinking that WMF is so special it can create a better solution without even know what others have done before is the height of arrogance. Modeling time and units can basically be made arbitrary complex, so the trick is in achieving the right balance of complexity vs utility. Time is complex enough that I think it deserves it's own thread. The first thing I'd do is establish some definitions to cover some basics like durations/intervals, uncertain dates, unknown dates, imprecise dates, etc to that everyone is using the same terminology and concepts. Much of the time discussion is difficult for me to follow because I have to guess at what people mean. In addition to the ability to handle circa/about dates already mentioned, it's also useful to be able to represent before/after dates e.g. he died before 1 Dec 1792 when his will was probated. Long term I suspect you'll need support for additional calendars rather than converting everything to a common calendar, but only supporting Gregorian is a good way to limit complexity to start with. Geologic times may (probably?) need to be modeled differently. Although I disagree strongly with Sven's sentiments about the appropriateness of reinventing things, I believe he's right about the need to support more units than just SI units and to know what units were used in the original measurement. It's not just a matter of aesthetics but of being able to preserve the provenance. Perhaps this gets saved for a future iteration, but you may find that you need both display and computable versions of things stored separately. Speaking of computable versions don't underestimate the issues with using floating points numbers. There are numbers that they just can't represent and their range is not infinite. Historians and genealogists have many interminable discussions about date/time representation which can be found in various list archives, but one recent spec worth reviewing is Extended Date/Time Format (EDTF) http://www.loc.gov/standards/datetime/pre-submission.html Another thing worth looking at is the Freebase schema since it not only represents a bunch of this stuff already, but it's got real world data stored in the schema and user interface implementations for input and rendering (although many of the latter could be improved). In particular, some of the following might be of interest: http://www.freebase.com/view/measurement_unit / http://www.freebase.com/schema/measurement_unit http://www.freebase.com/schema/time http://www.freebase.com/schema/astronomy/celestial_object_age http://www.freebase.com/schema/time/geologic_time_period http://www.freebase.com/schema/time/geologic_time_period_uncertainty If you rummage around, you can probably find lots of interesting examples and decide for yourself whether or not that's a good way to model things. I'm reasonably familiar with the schema and happy to answer questions. There are probably lots of other example vocabularlies that one could review such as the Pleiades project's: http://pleiades.stoa.org/vocabularies You're not going to get it right the first time, so I would just start with a small core that you're reasonably confident in and iterate from there. Tom On Wed, Dec 19, 2012 at 12:47 PM, Sven Manguard svenmangu...@gmail.comwrote: My philosophy is this: We should do whatever works best for Wikidata and Wikidata's needs. If people want to reuse our content, and the choices we've made make existing tools unworkable, they can build new tools themselves. We
Re: [Wikidata-l] Data values
Thanks for this Denny. Time: Historians **need** to be able to have date ranges of some sort. They also need to express confidence in non-numerical terms. Take for example, the invention of gunpowder in China. Not only do several major historians have different ranges entirely (which would, of course, be treated as different line items anyways), but the premier authorities are all giving date ranges. Some will say things like between XXX and YYY, which requires date ranges, while others say around ZZZ, which requires us to have some sort of way to represent about. As to the first issue, we could try pairing entries, like we already are likely to be doing for things like reign start date/reign end date but that would be clumsy and very easily broken. As for the latter, I'm really not sure what the proper solution is. I am sure though, that if a historian says about 850 and we put in 850, we're going to be **wrong** and that's going to be **bad data**. Additionally, unless the historian gives a range himself, we can't say 850 +/- 25 or some such thing. That would also be wrong. Geo: I can definitely see how altitude would be good for things like a rest lodge halfway up a mountain or a shipwreck below sea level. I'm not sure if any of the map makers can handle altitude right now; as far as I know things like Open Street Map and Google Maps are two dimensional maps with 'fake' three dimensional protrusions. That being said, I think that we should build the feature in and then trust that the map making companies will eventually figure out what to do with it. Google is probably crazy enough to mount cameras and GPS software on sherpas and send them up mountains if they think that maps accounting for altitude are something that could, erm, sell. Units: Not sure I understand the post, but I might. I advocate that we should have the unit translations stored on some page in the (already automatically full protected) MediaWiki namespace, and that conversions should be handled on this project before the data is sent out to client projects. The reason for this is that it makes adoption by (non WMF) end users much, much easier. It's not like the conversions are a subject of debate. On Tue, Dec 18, 2012 at 9:29 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Thanks for the input so far. Here are a few explicit questions that I have: * Time: right now the data model assumes that the precision is given on the level decade / year / month etc., which means you can enter a date of birth like 1435 or May 1918. But is this sufficient? We cannot enter a value like 2nd-5th century AD (we could enter 1st millenium AD, which would be a loss of precision). * Geo: the model assumes latitude, longitude and altitude, and defines altitude as over mean sea level (simplified). Is altitude at all useful? Should it be removed from Geolocation and be moved instead to a property called height or altitude which is dealt with outside of the geolocation? * Units are currently planned to be defined on the property page (as it is done in SMW). So you say that the height is measured in Meter which corresponds to 3.28084 feet, etc. Wikidata would allow to defined linear translations within the wiki and can thus be done by the community. This makes everything a bit more complicated -- one could also imagine to define all dimensions and units in PHP and then have the properties reference the dimensions. Since there are only a few hundred units and dimensions, this could be viable. (Non-linear transformations -- most notoriously temperature -- will get its own implementation anyway) Opinions? 2012/12/17 Denny Vrandečić denny.vrande...@wikimedia.de As Phase 2 is progressing, we have to decide on how to represent data values. I have created a draft for representing numbers and units, points in time, and locations, which can be found here: https://meta.wikimedia.org/wiki/Wikidata/Development/Representing_values including a first suggestion on the functionality of the UI which we would be aiming at eventually. The draft is unfortunately far from perfect, and I would very welcome comments and discussion. We probably will implement them in the following order: geolocation, date and time, numbers. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts
Re: [Wikidata-l] Data values
How about this: - Values default to a non-range value - You can click a checkbox that says range to turn the input into a range value instead - An entry can only be represented by either a non-range or a range number, not both This relieves our issue with query answering: Query: When was XXX invented? Non-range answer: June 1988 Range answer: sometime between May 1988 and October 1989 Does that work? Sven M On Tue, Dec 18, 2012 at 10:56 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Thank you for your comments, Friedrich. It would be possible and very flexible, and certainly more powerful than the current system. But we would loose the convenience of having one date, which we need for query answering (or we could default to the lower or upper bound, or the middle, but all of these are a bit arbitrary). The other option would be, as discussed in the answer to Marco, to use one data and an uncertainty, probably an uncertainty with a unit (and probably different lower and upper bounds). This would make it more consistent to the ways numbers are treated. I start to think that the additional complexity for this solution might be warranted. 2012/12/18 Friedrich Röhrs f.roe...@mis.uni-saarland.de Hi, * Time: Would it make sense to use time periods instead of partial datetimes with lower precision levels? Instead of using May 1918 as birth date it would be something like birth date in the interval 01.05.1918 - 31.05.1918. This does not necessarly need to be reflected in the UI of course, it could still allow the leave the field you dont know blank way. This would allow the value to be 2nd-5th century AD (01.01.100 - 31.12.400). Going with this idea all the way, datetimes at the highest precision level could be handled as periods too, just as zero length periods.. Another question that popped is how to represent (or if it should be represented) times where for example the hour is known, but the day isn't. If it is known someone was killed at Noon, but not the specific day. Friedrich On Tue, Dec 18, 2012 at 3:29 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Thanks for the input so far. Here are a few explicit questions that I have: * Time: right now the data model assumes that the precision is given on the level decade / year / month etc., which means you can enter a date of birth like 1435 or May 1918. But is this sufficient? We cannot enter a value like 2nd-5th century AD (we could enter 1st millenium AD, which would be a loss of precision). * Geo: the model assumes latitude, longitude and altitude, and defines altitude as over mean sea level (simplified). Is altitude at all useful? Should it be removed from Geolocation and be moved instead to a property called height or altitude which is dealt with outside of the geolocation? * Units are currently planned to be defined on the property page (as it is done in SMW). So you say that the height is measured in Meter which corresponds to 3.28084 feet, etc. Wikidata would allow to defined linear translations within the wiki and can thus be done by the community. This makes everything a bit more complicated -- one could also imagine to define all dimensions and units in PHP and then have the properties reference the dimensions. Since there are only a few hundred units and dimensions, this could be viable. (Non-linear transformations -- most notoriously temperature -- will get its own implementation anyway) Opinions? 2012/12/17 Denny Vrandečić denny.vrande...@wikimedia.de As Phase 2 is progressing, we have to decide on how to represent data values. I have created a draft for representing numbers and units, points in time, and locations, which can be found here: https://meta.wikimedia.org/wiki/Wikidata/Development/Representing_values including a first suggestion on the functionality of the UI which we would be aiming at eventually. The draft is unfortunately far from perfect, and I would very welcome comments and discussion. We probably will implement them in the following order: geolocation, date and time, numbers. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer
Re: [Wikidata-l] Data values
The great thing about MediaWiki is that we don't have to anticipate new features, we can build them in later when we discover that they're possible and that they're wanted. In fact, there's no requirement that the Wikidata developers are even the ones that do develop said hypothetical future modules. If you could code, you could build them and offer them up for integration. All that being said, there are already websites that map out astronomical features in a geolocation-like way. It's worthwhile to consider supporting that type of geolocation data on Wikidata. Sven On Tue, Dec 18, 2012 at 12:14 PM, Scott MacLeod worlduniversityandsch...@gmail.com wrote: Denny, Thanks for this. Are there ways to structure this geolocation data now to anticipate more 'fluid' uses of it, say 5 or 10 years from now, or beyond, in representing water, or astronomical processes, in something like interactive, realistic models of the earth or the universe, which would also be useful to Wikipedia's developing goals / mission? Scott On Tue, Dec 18, 2012 at 8:57 AM, Gregor Hagedorn g.m.haged...@gmail.comwrote: Now, I don't think we need or want ranges as a data type at all (better have separate properties for the beginning and end). I am afraid this will then put a heavy burden on users to enter, proofread, and output values. Data input becomes dispersed, because the value 18-25 cm length has to be split and entered separately. You have to write a custom output for each property then, and do all the query logic ( lower, upper) for each property in each Wikipedia client. I believe this is something that is healthy to do centrally. I believe the concept of intervals exists because of that. Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Scott MacLeod Founder President http://scottmacleod.com -- World University and School (like Wikipedia with MIT Open Course Ware) http://worlduniversityandschool.blogspot.com/ http://worlduniversity.wikia.com/wiki/World_University P.O. Box 442, (86 Ridgecrest Road), Canyon, CA 94516 415 480 4577 worlduniversityandsch...@gmail.com Skype: scottm100 Google + main, WUaS pages: https://plus.google.com/u/0/11589062932577910/posts https://plus.google.com/u/0/b/108179352492243955816/108179352492243955816/posts Please contribute, and invite friends to contribute, tax deductibly, via PayPal and credit card: http://scottmacleod.com/worlduniversityandschool.htm World University and School is a 501 (c) (3) tax-exempt educational organization. World University and School is sending you this because of your interest in free, online, higher education. If you don't want to receive these, please reply with 'remove' in the subject line. Thank you. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Publicity for Wikidata and Wikivoyage
Assuming that the most recent rough timeline I've got is still accurate, Wikidata will have deployed Phase II of its features by, at the latest, early January. I'd prefer that we wait at least a week or two after that for the project to re-stabilize before we make the big announcement, as I do not want to have people coming in right as we're working out the bugs from the Phase II deployment. That would mean, ballpark, that I'd be most comfortable with us doing this early February. That is in the timeline you gave. I would also like to stress the importance of the WMF communicating with these two mailing lists and giving us at least a 48 hour warning before the banners go live. I want to make sure that the local admins (and everyone else) knows that we're going to potentially get a large wave of new users. Call me a cynic, but considering how drama-free and smooth Wikidata has been, I've kinda been waiting for the other shoe to drop. I'm not saying that the people there now would be hostile to the people that might be joining after the announcement, but I am saying that having people on hand in case there is trouble is going to be important. Okay, so yes, I am truly a cynic of epic proportions. On a more positive note, it would also give me time to make sure that all of the project's help pages are cleaned up and that we have a nice centralized location that lists said pages where we can point the new people to. Thanks for the idea, Sven On Wed, Dec 5, 2012 at 6:03 PM, Erik Moeller e...@wikimedia.org wrote: Hi all, I'm curious what folks here think about making some noise about the two most recent additions to the Wikimedia family around January or February. Wikivoyage is still finishing up the beta phase (image transfers, logo import etc.) this month, and Wikidata isn't live as a repository yet -- but we could set a target date that would work for both projects. What I'm imagining is an actual banner on Wikimedia projects announcing the launch of both Wikivoyage and Wikidata, pointing to a landing page explaining what these projects are, how to participate, etc. That page could be drafted on Meta. Then interested folks would visit the projects to learn more and get involved. Wikimedia projects obviously have an enormous reach and I think this could help create awareness and build community -- but it could also be an unwelcome influx in the early stages. This could be regulated by running a banner only for logged in users, or x% of readers. Thoughts? Erik -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Bots and the new API
Hey there. To those of you that are American, happy Thanksgiving! To those of you that are not American, happy Why can't I reach any of the the Americans today, and what's with all this talk of stuffing? day!. So a while back when Denny gave a talk in Boston about Wikidata, he mentioned that he didn't want bots running until after the new API hit and we had a chance to make sure that nothing was broken. I reposted that on the project chat and we all agreed to stop the bots. That was on November 13. Today I wake up to find that there is a bot running, and another to-be-bot-op is asking if he can run his too. Does anyone know if the new API has hit? Denny had no idea when it would. If it has, and nothing is on fire, does anyone have any other objections to running bots again? On a final note, I would ask that the bots stop running at Q99900 and resume once we're at Q11, as Denny did make a point to me in private that day that I agree with, that it's great for the milestones in projects (and Q10 is a big one) to be reached by humans and not bots. Donno if that's possible to program in though. Gosh, I remember creating things with four digits after the Q. I'm feeling old. :D Sven ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bots and the new API
I'm getting contradictory messages from Wikidata staff then. I mean we already knew that we *could*, the issue is whether or not we *should*. Sven On Thu, Nov 22, 2012 at 2:02 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Thu, Nov 22, 2012 at 5:14 PM, Sven Manguard svenmangu...@gmail.com wrote: Hey there. To those of you that are American, happy Thanksgiving! To those of you that are not American, happy Why can't I reach any of the the Americans today, and what's with all this talk of stuffing? day!. So a while back when Denny gave a talk in Boston about Wikidata, he mentioned that he didn't want bots running until after the new API hit and we had a chance to make sure that nothing was broken. I reposted that on the project chat and we all agreed to stop the bots. That was on November 13. Today I wake up to find that there is a bot running, and another to-be-bot-op is asking if he can run his too. Does anyone know if the new API has hit? Denny had no idea when it would. If it has, and nothing is on fire, does anyone have any other objections to running bots again? No it has not. Technically you can run bots. Just don't got too crazy with them :) On a final note, I would ask that the bots stop running at Q99900 and resume once we're at Q11, as Denny did make a point to me in private that day that I agree with, that it's great for the milestones in projects (and Q10 is a big one) to be reached by humans and not bots. Donno if that's possible to program in though. Gosh, I remember creating things with four digits after the Q. I'm feeling old. :D Heh I think that's something you need to figure out with the people running the bots. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata license (was Introduction and some questions on Wikidata)
The argument above is about automatically copying over content from other projects. My point is that the license isn't the problem with it, but that there is a problem with it. Sven On Thu, Nov 15, 2012 at 7:05 PM, Gregor Hagedorn g.m.haged...@gmail.comwrote: On 15 November 2012 23:35, Sven svenmangu...@gmail.com wrote: Automatically copying over infoboxes is something I don't advise. Unlike current infoboxes, which are rarely sourced, every point of data on Wikidata should be DIRECTLY and INDIVIDUALLY sourced. We can use the same source 37 times, but each bit of information that would ordinarily have a field on an infobox needs to have its own source, we can't just say everything on this page is from . If we do automatic importing, it's going to be an uphill battle from day one to source things. (The argument above is independent of licensing, so this should perhaps be discussed in a separate thread?) Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Script that makes Wikidata so much easier
Hey there. Most of you will hopefully have already seen this, but there's a script out that makes importing interwiki so much easier. See http://www.wikidata.org/wiki/Wikidata:Project_chat#SlurpInterwiki_scriptfor details. Hope everyone is settling in as well as I am. Sven ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l