Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
Le vendredi 09 août 2013 à 08:25 -0700, Jiang BIAN a écrit : > @Mathieu, > > > sorry for my ignorance, Sorry for my esoteric jargon. ;) (I see that Federico Levo (Nemo) already answered your question) > when you say > """meta""" would be the most obvious channel, > what's the "meta" mean? a page or a site? could you share the link > with me? I'm interested in the discussion on this. > > > Thanks > > > ___ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Make Commons a wikidata client
Hi everyone, At Wikimania we had several discussions about the future of Wikidata and Commons. Some broader feedback would be nice. Now we have a property "Commons category" (https://www.wikidata.org/wiki/Property:P373). This is a string and an intermediate solution. In the long run Commons should probably be a wikibase instance in it's own right (structured metadata stored at Commons) integrated with Wikidata.org, see https://www.wikidata.org/wiki/Wikidata:Wikimedia_Commons for more info. In the meantime we should make Commons a wikidata client like Wikipedia and Wikivoyage. How would that work? We have an item https://www.wikidata.org/wiki/Q9920 for the city Haarlem. It links to the Wikipedia article "Haarlem" and the Wikivoyage article "Haarlem". It should link to the Commons gallery "Haarlem" (https://commons.wikimedia.org/wiki/Haarlem) We have an item https://www.wikidata.org/wiki/Q7427769 for the category Haarlem. It links to the Wikipedia category "Haarlem". It should link to the Commons category "Haarlem" (https://commons.wikimedia.org/wiki/Category:Haarlem). The category item (Q7427769) links to article item (Q9920) using the property "main category topic" (https://www.wikidata.org/wiki/Property:P301). We would need to make an inverse property of P301 to make the backlink. Some reasons why this is helpful: * Wikidata takes care of a lot of things like page moves, deletions, etc. Now with P373 (Commons category) it's all manual * Having Wikidata on Commons means that you can automatically get backlinks to Wikipedia, have intro's for category, etc etc * It's a step in the right direction. It makes it easier to do next steps Small change, lot's of benefits! Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Weekly Summary #70
It would appear that there is more negative feedback than positive on the logo change... On Aug 9, 2013 10:18 AM, "adam.shorl...@wikimedia.de" < adam.shorl...@wikimedia.de> wrote: > Wikimania Continues! (I hope you like our current Hong Kong logo!) > > Make sure you come and say hi to use if you are attending! > > Checkout this weeks summary! > http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_08_09 > > Have a great weekend! > Adam > > ___ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata RDF export available
Hi Sebastian, On 09/08/13 15:44, Sebastian Hellmann wrote: Hi Markus, we just had a look at your python code and created a dump. We are still getting a syntax error for the turtle dump. You mean "just" as in "at around 15:30 today" ;-)? The code is under heavy development, so changes are quite frequent. Please expect things to be broken in some cases (this is just a little community project, not part of the official Wikidata development). I have just uploaded a new statements export (20130808) to http://semanticweb.org/RDF/Wikidata/ which you might want to try. I saw, that you did not use a mature framework for serializing the turtle. Let me explain the problem: Over the last 4 years, I have seen about two dozen people (undergraduate and PhD students, as well as Post-Docs) implement "simple" serializers for RDF. They all failed. This was normally not due to the lack of skill, but due to the lack of missing time. They wanted to do it quick, but they didn't have the time to implement it correctly in the long run. There are some really nasty problems ahead like encoding or special characters in URIs. I would direly advise you to: 1. use a Python RDF framework 2. do some syntax tests on the output, e.g. with "rapper" 3. use a line by line format, e.g. use turtle without prefixes and just one triple per line (It's like NTriples, but with Unicode) Yes, URI encoding could be difficult if we were doing it manually. Note, however, that we are already using a standard library for URI encoding in all non-trivial cases, so this does not seem to be a very likely cause of the problem (though some non-zero probability remains). In general, it is not unlikely that there are bugs in the RDF somewhere; please consider this export as an early prototype that is meant for experimentation purposes. If you want an official RDF dump, you will have to wait for the Wikidata project team to get around doing it (this will surely be based on an RDF library). Personally, I already found the dump useful (I successfully imported some 109 million triples of some custom script into an RDF store), but I know that it can require some tweaking. We are having a problem currently, because we tried to convert the dump to NTriples (which would be handled by a framework as well) with rapper. We assume that the error is an extra "<" somewhere (not confirmed) and we are still searching for it since the dump is so big Ok, looking forward to hear about the results of your search. A good tip for checking such things is to use grep. I did a quick grep on my current local statements export to count the numbers of < and > (this takes less than a minute on my laptop, including on-the-fly decompression). Both numbers were equal, making it unlikely that there is any unmatched < in the current dumps. Then I used grep to check that < and > only occur in the statements files in lines with "commons" URLs. These are created using urllib, so there should never be any < or > in them. so we can not provide a detailed bug report. If we had one triple per line, this would also be easier, plus there are advantages for stream reading. bzip2 compression is very good as well, no need for prefix optimization. Not sure what you mean here. Turtle prefixes in general seem to be a Good Thing, not just for reducing the file size. The code has no easy way to get rid of prefixes, but if you want a line-by-line export you could subclass my exporter and overwrite the methods for incremental triple writing so that they remember the last subject (or property) and create full triples instead. This would give you a line-by-line export in (almost) no time (some uses of [...] blocks in object positions would remain, but maybe you could live with that). Best wishes, Markus All the best, Sebastian Am 03.08.2013 23:22, schrieb Markus Krötzsch: Update: the first bugs in the export have already been discovered -- and fixed in the script on github. The files I uploaded will be updated on Monday when I have a better upload again (the links file should be fine, the statements file requires a rather tolerant Turtle string literal parser, and the labels file has a malformed line that will hardly work anywhere). Markus On 03/08/13 14:48, Markus Krötzsch wrote: Hi, I am happy to report that an initial, yet fully functional RDF export for Wikidata is now available. The exports can be created using the wda-export-data.py script of the wda toolkit [1]. This script downloads recent Wikidata database dumps and processes them to create RDF/Turtle files. Various options are available to customize the output (e.g., to export statements but not references, or to export only texts in English and Wolof). The file creation takes a few (about three) hours on my machine depending on what exactly is exported. For your convenience, I have created some example exports based on yesterday's dumps. These can be found at [2]. There are three Turtl
Re: [Wikidata-l] Wikidata RDF export available
Over time people have gotten the message that you shouldn't write XML like System.out.println(""+someString+"") because it is something that usually ends in tears. Although (most) RDF toolkits are like XML toolkits in that they choke on invalid data, people who write RDF seem to have little concern of whether or not it is valid. This cultural problem is one of the reasons why RDF has seemed to catch on so slow. If you told somebody their XML is invalid, they'll feel like they have to do, but people don't seem to take any action when they hear that the 20 GB file they published is trash. As a general practice you should use real RDF tools to write RDF files. This adds some overhead, but it's generally not hard and it gives you a pretty good chance you'll get valid output. ;-) Lately I've been working on this system https://github.com/paulhoule/infovore/wiki which is intended to deal with exactly this situation on a large scale. The "Parallel Super Eyeball 3" (3 means triple, PSE 4 is a hypothetical tool that does the same for quads) tool physically separates valid and invalid triples so you can use the valid triples while being aware of what invalid data tried to sneak it. Early next week I'm planning on rolling out ":BaseKB Now" which will be filtered Freebase data, processed automatically on a weekly basis. I've got a project in the pipeline that are going to require Wikipedia Categories (I better get them fast before they go away) and another large 4D metamemomic data set for which Wikidata Phase I will be a Rosetta Stone so support for those data sets are on my critical path. -Original Message- From: Sebastian Hellmann Sent: Friday, August 9, 2013 10:44 AM To: Discussion list for the Wikidata project. Cc: Dimitris Kontokostas ; Jona Christopher Sahnwaldt Subject: Re: [Wikidata-l] Wikidata RDF export available Hi Markus, we just had a look at your python code and created a dump. We are still getting a syntax error for the turtle dump. I saw, that you did not use a mature framework for serializing the turtle. Let me explain the problem: Over the last 4 years, I have seen about two dozen people (undergraduate and PhD students, as well as Post-Docs) implement "simple" serializers for RDF. They all failed. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
Jiang BIAN, 09/08/2013 17:25: when you say """meta""" would be the most obvious channel, what's the "meta" mean? a page or a site? could you share the link with me? I'm interested in the discussion on this. Meta = Meta-Wiki = http://meta.wikimedia.org/ https://meta.wikimedia.org/wiki/Meta:About Specifically https://meta.wikimedia.org/wiki/Wiktionary_future Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
@Mathieu, sorry for my ignorance, when you say """meta""" would be the most obvious channel, what's the "meta" mean? a page or a site? could you share the link with me? I'm interested in the discussion on this. Thanks ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata RDF export available
Hi Markus, we just had a look at your python code and created a dump. We are still getting a syntax error for the turtle dump. I saw, that you did not use a mature framework for serializing the turtle. Let me explain the problem: Over the last 4 years, I have seen about two dozen people (undergraduate and PhD students, as well as Post-Docs) implement "simple" serializers for RDF. They all failed. This was normally not due to the lack of skill, but due to the lack of missing time. They wanted to do it quick, but they didn't have the time to implement it correctly in the long run. There are some really nasty problems ahead like encoding or special characters in URIs. I would direly advise you to: 1. use a Python RDF framework 2. do some syntax tests on the output, e.g. with "rapper" 3. use a line by line format, e.g. use turtle without prefixes and just one triple per line (It's like NTriples, but with Unicode) We are having a problem currently, because we tried to convert the dump to NTriples (which would be handled by a framework as well) with rapper. We assume that the error is an extra "<" somewhere (not confirmed) and we are still searching for it since the dump is so big so we can not provide a detailed bug report. If we had one triple per line, this would also be easier, plus there are advantages for stream reading. bzip2 compression is very good as well, no need for prefix optimization. All the best, Sebastian Am 03.08.2013 23:22, schrieb Markus Krötzsch: Update: the first bugs in the export have already been discovered -- and fixed in the script on github. The files I uploaded will be updated on Monday when I have a better upload again (the links file should be fine, the statements file requires a rather tolerant Turtle string literal parser, and the labels file has a malformed line that will hardly work anywhere). Markus On 03/08/13 14:48, Markus Krötzsch wrote: Hi, I am happy to report that an initial, yet fully functional RDF export for Wikidata is now available. The exports can be created using the wda-export-data.py script of the wda toolkit [1]. This script downloads recent Wikidata database dumps and processes them to create RDF/Turtle files. Various options are available to customize the output (e.g., to export statements but not references, or to export only texts in English and Wolof). The file creation takes a few (about three) hours on my machine depending on what exactly is exported. For your convenience, I have created some example exports based on yesterday's dumps. These can be found at [2]. There are three Turtle files: site links only, labels/descriptions/aliases only, statements only. The fourth file is a preliminary version of the Wikibase ontology that is used in the exports. The export format is based on our earlier proposal [3], but it adds a lot of details that had not been specified there yet (namespaces, references, ID generation, compound datavalue encoding, etc.). Details might still change, of course. We might provide regular dumps at another location once the format is stable. As a side effect of these activities, the wda toolkit [1] is also getting more convenient to use. Creating code for exporting the data into other formats is quite easy. Features and known limitations of the wda RDF export: (1) All current Wikidata datatypes are supported. Commons-media data is correctly exported as URLs (not as strings). (2) One-pass processing. Dumps are processed only once, even though this means that we may not know the types of all properties when we first need them: the script queries wikidata.org to find missing information. This is only relevant when exporting statements. (3) Limited language support. The script uses Wikidata's internal language codes for string literals in RDF. In some cases, this might not be correct. It would be great if somebody could create a mapping from Wikidata language codes to BCP47 language codes (let me know if you think you can do this, and I'll tell you where to put it) (4) Limited site language support. To specify the language of linked wiki sites, the script extracts a language code from the URL of the site. Again, this might not be correct in all cases, and it would be great if somebody had a proper mapping from Wikipedias/Wikivoyages to language codes. (5) Some data excluded. Data that cannot currently be edited is not exported, even if it is found in the dumps. Examples include statement ranks and timezones for time datavalues. I also currently exclude labels and descriptions for simple English, formal German, and informal Dutch, since these would pollute the label space for English, German, and Dutch without adding much benefit (other than possibly for simple English descriptions, I cannot see any case where these languages should ever have different Wikidata texts at all). Feedback is welcome. Cheers, Markus [1] https://github.com/mkroetzsch/wda Run "python wda-export.data.py -
[Wikidata-l] Weekly Summary #70
Wikimania Continues! (I hope you like our current Hong Kong logo!) Make sure you come and say hi to use if you are attending! Checkout this weeks summary! http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_08_09 Have a great weekend! Adam ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
Le 2013-08-09 13:04, Romaine Wiki a écrit : Are there much users from Wiktionary in Hong Kong? I do not think any of the Dutch users is, I can't say for others. I think it would be essential that this subject is discussed inside the wider Wiktionary community. To me the group of users participating is too narrow. Also is a mailing list not handy as most of the users from Wiktionary do not read that. I think a Wikt-community wide discussion is needed. I agree, and I think meta would be the most obvious channel for such a discussion. As said in the previous email, there's already [[Wiktionary future]] which is waiting for contributions and discussion on meta. Anyway, whatever the canal, it would be realy important to make aware as much contributors as possible aware of this initiative, so they can provide relevant feedback specific to their needs. Romaine On Fri, 8/9/13, David Cuenca wrote: Subject: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata To: wiktionar...@lists.wikimedia.org, "Wikimania general list (open subscription)" , "Discussion list for the Wikidata project." , "Wikimedia Mailing List" Date: Friday, August 9, 2013, 4:43 AM Hi, If there is someone in Wikimania interested in participating in the talks about the future support of Wiktionary in Wikidata, we will having a discussion about the several proposals. http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata Date : Saturday, 10 Aug, 11:30 am - 1:00 pm Place: Y520 (block Y, 5th floor) See you there, Micru -Inline Attachment Follows- ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Association Culture-Libre http://www.culture-libre.org/ ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata tutorials on SMWCon Fall 2013?
Hi Adam! We're preparing the second announcement and we want to include the tutorials there s well. Can you share the details of your tutorial by Monday? - Yury Katkov, WikiVote On Fri, Jul 26, 2013 at 8:56 PM, Lydia Pintscher < lydia.pintsc...@wikimedia.de> wrote: > Hey Yury :) > > On Tue, Jul 23, 2013 at 11:14 AM, Yury Katkov wrote: >> Greetings to Wikidata team and community from Semantic MediaWiki team >> and community! >> >> It seems that already there are a lot of things possible to do with >> Wikidata. What about including some Wikidata tutorials to the tutorial >> day of SMWCon conference? >> I can already think of the following exciting topics: >> Basic tutorials: >> * adding information and querying Wikidata >> * using Wikidata extensions in enterprise >> Advanced topics: >> * using Wikidata API >> >> Surely, there can be a lot more interesting topics than that! >> Of course all the tutorials will be video recorded and can be then >> used as learning materials. >> >> If you're interested in giving the tutorial please read our Call for >> Tutorials [1] write a short proposal and contact me. > > Adam has been using the API of Wikidata a lot over the last months and > has now also fixed a lot of bugs in it. He'd like to give a tutorial > on that. I'll let you two figure out the details. > Please let me know if you need anything else. > > Looking forward to SMWCon! > > > Cheers > Lydia > > -- > Lydia Pintscher - http://about.me/lydia.pintscher > Community Communications for Technical Projects > > Wikimedia Deutschland e.V. > Obentrautstr. 72 > 10963 Berlin > www.wikimedia.de > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. > > ___ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
Are there much users from Wiktionary in Hong Kong? I do not think any of the Dutch users is, I can't say for others. I think it would be essential that this subject is discussed inside the wider Wiktionary community. To me the group of users participating is too narrow. Also is a mailing list not handy as most of the users from Wiktionary do not read that. I think a Wikt-community wide discussion is needed. Romaine On Fri, 8/9/13, David Cuenca wrote: Subject: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata To: wiktionar...@lists.wikimedia.org, "Wikimania general list (open subscription)" , "Discussion list for the Wikidata project." , "Wikimedia Mailing List" Date: Friday, August 9, 2013, 4:43 AM Hi, If there is someone in Wikimania interested in participating in the talks about the future support of Wiktionary in Wikidata, we will having a discussion about the several proposals. http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata Date : Saturday, 10 Aug, 11:30 am - 1:00 pm Place: Y520 (block Y, 5th floor) See you there, Micru -Inline Attachment Follows- ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wiktionary-l] Meeting about the support of Wiktionary in Wikidata
Hoi, Would be interested but I am not "there" ... As are many other people ... Thanks, GerardM On 9 August 2013 06:43, David Cuenca wrote: > Hi, > > If there is someone in Wikimania interested in participating in the talks > about the future support of Wiktionary in Wikidata, we will having a > discussion about the several proposals. > http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata > > Date : Saturday, 10 Aug, 11:30 am - 1:00 pm > Place: Y520 (block Y, 5th floor) > > See you there, > Micru > ___ > Wiktionary-l mailing list > wiktionar...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l > ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l