Re: [Wikidata-l] [Wiktionary-l] Meeting about the support of Wiktionary in Wikidata

2013-08-09 Thread Gerard Meijssen
Hoi,
Would be interested but I am not there ... As are many other people ...
Thanks,
GerardM


On 9 August 2013 06:43, David Cuenca dacu...@gmail.com wrote:

  wiktionar...@lists.wikimedia.orgHi,

 If there is someone in Wikimania interested in participating in the talks
 about the future support of Wiktionary in Wikidata, we will having a
 discussion about the several proposals.
 http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata

 Date : Saturday, 10 Aug, 11:30 am - 1:00 pm
 Place: Y520 (block Y, 5th floor)

 See you there,
 Micru
 ___
 Wiktionary-l mailing list
 wiktionar...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata tutorials on SMWCon Fall 2013?

2013-08-09 Thread Yury Katkov
Hi Adam!

We're preparing the second announcement and we want to include the
tutorials there s well. Can you share the details of your tutorial by
Monday?
-
Yury Katkov, WikiVote

On Fri, Jul 26, 2013 at 8:56 PM, Lydia Pintscher 
lydia.pintsc...@wikimedia.de wrote:
 Hey Yury :)

 On Tue, Jul 23, 2013 at 11:14 AM, Yury Katkov katkov.ju...@gmail.com
wrote:
 Greetings to Wikidata team and community from Semantic MediaWiki team
 and community!

 It seems that already there are a lot of things possible to do with
 Wikidata. What about including some Wikidata tutorials to the tutorial
 day of SMWCon conference?
 I can already think of the following exciting topics:
 Basic tutorials:
 * adding information and querying Wikidata
 * using Wikidata extensions in enterprise
 Advanced topics:
 * using Wikidata API

 Surely, there can be a lot more interesting topics than that!
 Of course all the tutorials will be video recorded and can be then
 used as learning materials.

 If you're interested in giving the tutorial please read our Call for
 Tutorials [1] write a short proposal and contact me.

 Adam has been using the API of Wikidata a lot over the last months and
 has now also fixed a lot of bugs in it. He'd like to give a tutorial
 on that. I'll let you two figure out the details.
 Please let me know if you need anything else.

 Looking forward to SMWCon!


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Community Communications for Technical Projects

 Wikimedia Deutschland e.V.
 Obentrautstr. 72
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Meeting about the support of Wiktionary in Wikidata

2013-08-09 Thread Mathieu Stumpf

Le 2013-08-09 13:04, Romaine Wiki a écrit :

Are there much users from Wiktionary in Hong Kong? I do not think any
of the Dutch users is, I can't say for others.

I think it would be essential that this subject is discussed inside
the wider Wiktionary community. To me the group of users 
participating

is too narrow. Also is a mailing list not handy as most of the users
from Wiktionary do not read that. I think a Wikt-community wide
discussion is needed.


I agree, and I think meta would be the most obvious channel for such
a discussion.

As said in the previous email, there's already [[Wiktionary future]]
which is waiting for contributions and discussion on meta. Anyway,
whatever the canal, it would be realy important to make aware as
much contributors as possible aware of this initiative, so they can
provide relevant feedback specific to their needs.



Romaine



On Fri, 8/9/13, David Cuenca dacu...@gmail.com wrote:

 Subject: [Wikidata-l] Meeting about the support of Wiktionary in 
Wikidata

 To: wiktionar...@lists.wikimedia.org, Wikimania general list (open
subscription) wikimani...@lists.wikimedia.org, Discussion list 
for

the Wikidata project. wikidata-l@lists.wikimedia.org, Wikimedia
Mailing List wikimedi...@lists.wikimedia.org
 Date: Friday, August 9, 2013, 4:43 AM

 Hi,


 If there is someone in Wikimania interested in participating
 in the talks about the future support of Wiktionary in
 Wikidata, we will having a discussion about the several
 proposals.



 
http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata


 Date : Saturday, 10 Aug, 11:30 am - 1:00 pm


 Place: Y520 (block Y, 5th floor)

 See you there,
 Micru


 -Inline Attachment Follows-

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


--
Association Culture-Libre
http://www.culture-libre.org/

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Weekly Summary #70

2013-08-09 Thread adam.shorl...@wikimedia.de
Wikimania Continues! (I hope you like our current Hong Kong logo!)

Make sure you come and say hi to use if you are attending!

Checkout this weeks summary!
http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_08_09

Have a great weekend!
Adam
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Sebastian Hellmann

Hi Markus,
we just had a look at your python code and created a dump. We are still 
getting a syntax error for the turtle dump.


I saw, that you did not use a mature framework for serializing the 
turtle. Let me explain the problem:


Over the last 4 years, I have seen about two dozen people (undergraduate 
and PhD students, as well as Post-Docs) implement simple serializers 
for RDF.


They all failed.

This was normally not due to the lack of skill, but due to the lack of 
missing time. They wanted to do it quick, but they didn't have the time 
to implement it correctly in the long run.
There are some really nasty problems ahead like encoding or special 
characters in URIs. I would direly advise you to:


1. use a Python RDF framework
2. do some syntax tests on the output, e.g. with rapper
3. use a line by line format, e.g. use turtle without prefixes and just 
one triple per line (It's like NTriples, but with Unicode)


We are having a problem currently, because we tried to convert the dump 
to NTriples (which would be handled by a framework as well) with rapper.
We assume that the error is an extra  somewhere (not confirmed) and 
we are still searching for it since the dump is so big
so we can not provide a detailed bug report. If we had one triple per 
line, this would also be easier, plus there are advantages for stream 
reading. bzip2 compression is very good as well, no need for prefix 
optimization.


All the best,
Sebastian

Am 03.08.2013 23:22, schrieb Markus Krötzsch:
Update: the first bugs in the export have already been discovered -- 
and fixed in the script on github. The files I uploaded will be 
updated on Monday when I have a better upload again (the links file 
should be fine, the statements file requires a rather tolerant Turtle 
string literal parser, and the labels file has a malformed line that 
will hardly work anywhere).


Markus

On 03/08/13 14:48, Markus Krötzsch wrote:

Hi,

I am happy to report that an initial, yet fully functional RDF export
for Wikidata is now available. The exports can be created using the
wda-export-data.py script of the wda toolkit [1]. This script downloads
recent Wikidata database dumps and processes them to create RDF/Turtle
files. Various options are available to customize the output (e.g., to
export statements but not references, or to export only texts in English
and Wolof). The file creation takes a few (about three) hours on my
machine depending on what exactly is exported.

For your convenience, I have created some example exports based on
yesterday's dumps. These can be found at [2]. There are three Turtle
files: site links only, labels/descriptions/aliases only, statements
only. The fourth file is a preliminary version of the Wikibase ontology
that is used in the exports.

The export format is based on our earlier proposal [3], but it adds a
lot of details that had not been specified there yet (namespaces,
references, ID generation, compound datavalue encoding, etc.). Details
might still change, of course. We might provide regular dumps at another
location once the format is stable.

As a side effect of these activities, the wda toolkit [1] is also
getting more convenient to use. Creating code for exporting the data
into other formats is quite easy.

Features and known limitations of the wda RDF export:

(1) All current Wikidata datatypes are supported. Commons-media data is
correctly exported as URLs (not as strings).

(2) One-pass processing. Dumps are processed only once, even though this
means that we may not know the types of all properties when we first
need them: the script queries wikidata.org to find missing information.
This is only relevant when exporting statements.

(3) Limited language support. The script uses Wikidata's internal
language codes for string literals in RDF. In some cases, this might not
be correct. It would be great if somebody could create a mapping from
Wikidata language codes to BCP47 language codes (let me know if you
think you can do this, and I'll tell you where to put it)

(4) Limited site language support. To specify the language of linked
wiki sites, the script extracts a language code from the URL of the
site. Again, this might not be correct in all cases, and it would be
great if somebody had a proper mapping from Wikipedias/Wikivoyages to
language codes.

(5) Some data excluded. Data that cannot currently be edited is not
exported, even if it is found in the dumps. Examples include statement
ranks and timezones for time datavalues. I also currently exclude labels
and descriptions for simple English, formal German, and informal Dutch,
since these would pollute the label space for English, German, and Dutch
without adding much benefit (other than possibly for simple English
descriptions, I cannot see any case where these languages should ever
have different Wikidata texts at all).

Feedback is welcome.

Cheers,

Markus

[1] https://github.com/mkroetzsch/wda
 Run python wda-export.data.py --help 

Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Markus Krötzsch

Hi Sebastian,

On 09/08/13 15:44, Sebastian Hellmann wrote:

Hi Markus,
we just had a look at your python code and created a dump. We are still
getting a syntax error for the turtle dump.


You mean just as in at around 15:30 today ;-)? The code is under 
heavy development, so changes are quite frequent. Please expect things 
to be broken in some cases (this is just a little community project, not 
part of the official Wikidata development).


I have just uploaded a new statements export (20130808) to 
http://semanticweb.org/RDF/Wikidata/ which you might want to try.




I saw, that you did not use a mature framework for serializing the
turtle. Let me explain the problem:

Over the last 4 years, I have seen about two dozen people (undergraduate
and PhD students, as well as Post-Docs) implement simple serializers
for RDF.

They all failed.

This was normally not due to the lack of skill, but due to the lack of
missing time. They wanted to do it quick, but they didn't have the time
to implement it correctly in the long run.
There are some really nasty problems ahead like encoding or special
characters in URIs. I would direly advise you to:

1. use a Python RDF framework
2. do some syntax tests on the output, e.g. with rapper
3. use a line by line format, e.g. use turtle without prefixes and just
one triple per line (It's like NTriples, but with Unicode)


Yes, URI encoding could be difficult if we were doing it manually. Note, 
however, that we are already using a standard library for URI encoding 
in all non-trivial cases, so this does not seem to be a very likely 
cause of the problem (though some non-zero probability remains). In 
general, it is not unlikely that there are bugs in the RDF somewhere; 
please consider this export as an early prototype that is meant for 
experimentation purposes. If you want an official RDF dump, you will 
have to wait for the Wikidata project team to get around doing it (this 
will surely be based on an RDF library). Personally, I already found the 
dump useful (I successfully imported some 109 million triples of some 
custom script into an RDF store), but I know that it can require some 
tweaking.




We are having a problem currently, because we tried to convert the dump
to NTriples (which would be handled by a framework as well) with rapper.
We assume that the error is an extra  somewhere (not confirmed) and
we are still searching for it since the dump is so big


Ok, looking forward to hear about the results of your search. A good tip 
for checking such things is to use grep. I did a quick grep on my 
current local statements export to count the numbers of  and  (this 
takes less than a minute on my laptop, including on-the-fly 
decompression). Both numbers were equal, making it unlikely that there 
is any unmatched  in the current dumps. Then I used grep to check that 
 and  only occur in the statements files in lines with commons URLs. 
These are created using urllib, so there should never be any  or  in them.



so we can not provide a detailed bug report. If we had one triple per
line, this would also be easier, plus there are advantages for stream
reading. bzip2 compression is very good as well, no need for prefix
optimization.


Not sure what you mean here. Turtle prefixes in general seem to be a 
Good Thing, not just for reducing the file size. The code has no easy 
way to get rid of prefixes, but if you want a line-by-line export you 
could subclass my exporter and overwrite the methods for incremental 
triple writing so that they remember the last subject (or property) and 
create full triples instead. This would give you a line-by-line export 
in (almost) no time (some uses of [...] blocks in object positions would 
remain, but maybe you could live with that).


Best wishes,

Markus



All the best,
Sebastian

Am 03.08.2013 23:22, schrieb Markus Krötzsch:

Update: the first bugs in the export have already been discovered --
and fixed in the script on github. The files I uploaded will be
updated on Monday when I have a better upload again (the links file
should be fine, the statements file requires a rather tolerant Turtle
string literal parser, and the labels file has a malformed line that
will hardly work anywhere).

Markus

On 03/08/13 14:48, Markus Krötzsch wrote:

Hi,

I am happy to report that an initial, yet fully functional RDF export
for Wikidata is now available. The exports can be created using the
wda-export-data.py script of the wda toolkit [1]. This script downloads
recent Wikidata database dumps and processes them to create RDF/Turtle
files. Various options are available to customize the output (e.g., to
export statements but not references, or to export only texts in English
and Wolof). The file creation takes a few (about three) hours on my
machine depending on what exactly is exported.

For your convenience, I have created some example exports based on
yesterday's dumps. These can be found at [2]. There are three Turtle
files: site links 

Re: [Wikidata-l] Weekly Summary #70

2013-08-09 Thread Sven Manguard
It would appear that there is more negative feedback than positive on the
logo change...
On Aug 9, 2013 10:18 AM, adam.shorl...@wikimedia.de 
adam.shorl...@wikimedia.de wrote:

 Wikimania Continues! (I hope you like our current Hong Kong logo!)

 Make sure you come and say hi to use if you are attending!

 Checkout this weeks summary!
 http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_08_09

 Have a great weekend!
 Adam

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Make Commons a wikidata client

2013-08-09 Thread Maarten Dammers

Hi everyone,

At Wikimania we had several discussions about the future of Wikidata and 
Commons. Some broader feedback would be nice.
Now we have a property Commons category 
(https://www.wikidata.org/wiki/Property:P373). This is a string and an 
intermediate solution.
In the long run Commons should probably be a wikibase instance in it's 
own right (structured metadata stored at Commons) integrated with 
Wikidata.org, see 
https://www.wikidata.org/wiki/Wikidata:Wikimedia_Commons for more info.
In the meantime we should make Commons a wikidata client like Wikipedia 
and Wikivoyage. How would that work?


We have an item https://www.wikidata.org/wiki/Q9920 for the city 
Haarlem. It links to the Wikipedia article Haarlem and the Wikivoyage 
article Haarlem. It should link to the Commons gallery Haarlem 
(https://commons.wikimedia.org/wiki/Haarlem)


We have an item https://www.wikidata.org/wiki/Q7427769 for the category 
Haarlem. It links to the Wikipedia category Haarlem. It should link to 
the Commons category Haarlem 
(https://commons.wikimedia.org/wiki/Category:Haarlem).


The category item (Q7427769) links to article item (Q9920) using the 
property main category topic 
(https://www.wikidata.org/wiki/Property:P301).

We would need to make an inverse property of P301 to make the backlink.

Some reasons why this is helpful:
* Wikidata takes care of a lot of things like page moves, deletions, 
etc. Now with P373 (Commons category) it's all manual
* Having Wikidata on Commons means that you can automatically get 
backlinks to Wikipedia, have intro's for category, etc etc

* It's a step in the right direction. It makes it easier to do next steps

Small change, lot's of benefits!

Maarten

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l