Re: [Wikidata] extracting type hierarchy of Wikidata

2017-07-06 Thread Stas Malyshev
Hi!

> We're trying to extract full type hierarchy of Wikidata starting from
> all occurrences of P31 and P279. While we have some custom code for
> this, we're thinking there may be a smarter/more-efficient way of doing
> it using SPARQL or a tool that we are probably unaware of. Any hint
> would be appreciated. :)

Well, Blazegraph implements BFS:
https://wiki.blazegraph.com/wiki/index.php/RDF_GAS_API#GAS_Examples
which may be useful in this case, though I am not sure it is possible to
map the whole thing in one query without running into timeouts.

Also, I'm not sure P31 and P279 currently represent hierarchy as such -
t.e. loops have been known to exist in those (maybe already fixed, but
not 100% sure). So one needs to be aware of that too.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] extracting type hierarchy of Wikidata

2017-07-06 Thread Leila Zia
Hi all,

We're trying to extract full type hierarchy of Wikidata starting from all
occurrences of P31 and P279. While we have some custom code for this, we're
thinking there may be a smarter/more-efficient way of doing it using SPARQL
or a tool that we are probably unaware of. Any hint would be appreciated. :)

Thanks,
Leila

In case you wonder why we ended up with this question and who "we" is ;):

The research is being documented at
https://meta.wikimedia.org/wiki/Research_talk:Expanding_Wikipedia_stubs_across_languages
. (The documentation is not most up-to-date, but it will give you the gist
of what we are doing.)

We are interested in building systems that can help editors and editathon
organizers identify the most common structures for different article types
given the already existing articles in each type/category in Wikipedia (in
a fixed language or across languages) and the information available in
those articles.

The challenge we have run into, and we're not the first to run into it, is
that the categories in Wikipedia don't have (as a whole) is-a relationship.
This is a big problem for information extraction based on the category
system, and we're trying to find a way to clean it up before starting to
use it for this research. (We've looked at the body of research that
attempts to clean up Wikipedia category system for knowledge extraction and
none of what we've found addresses the problem we have. More on that once
we complete the documentation.)
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Stas Malyshev
Hi!

> I just tried opening this link in Firefox and it complained that the XML
> file is not well-formed, at the first  element. As
> far as I can tell, that’s correct according to Namespaces in XML 1.1 [1]:

Such URLs may be also problematic if we want to store it in WDQS - right
now such URL would not be inlined properly, which means it would have
significant storage and performance hits when using such URLs.

Also, wdt:Wikidata:P4 and wikidata's (or test wikidata instance's)
wdt:P4 are different URLs, so one can not identify them as the same
thing (despite them being the same thing in fact, that's the point of
federation, as I understand). I think P4 should be
http://federated-wikidata.wmflabs.org.wmflabs.org/prop/direct/P4 in this
case (or, in case of real Wikidata federation, it would use actual
wikidata URL of course).


-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Lucas Werkmeister
I just tried opening this link in Firefox and it complained that the XML
file is not well-formed, at the first  element. As
far as I can tell, that’s correct according to Namespaces in XML 1.1 [1]:

[8]  PrefixedName::= Prefix : LocalPart
[10] Prefix  ::= NCName
[11] LocalPart   ::= NCName
[4]  NCName  ::= NCNameStartChar NCNameChar /* An XML Name,
minus the ":" */
[5]  NCNameChar  ::= NameChar - ':'
[6]  NCNameStartChar ::= NameStartChar - ':'

Neither the prefix nor the local part may contain a colon, so
wdt:Wikidata:P4 is not a valid element name (unless you follow pure XML
without XML namespaces, where colons have no semantic meaning – but RDF
syntax [2] requires XML namespaces). (On the other hand, Turtle [3]
explicitly allows “non leading colons”, so this isn’t a problem for that
format.)

Is there already a Phabricator task for this issue? I couldn’t find one.

Cheers,
Lucas

[1]: https://www.w3.org/TR/2006/REC-xml-names11-20060816/
[2]: https://www.w3.org/TR/1999/PR-rdf-syntax-19990105/
[3]: https://www.w3.org/TR/2014/REC-turtle-20140225/


On 06.07.2017 23:02, Amir Tafreshi wrote:
> Hey,
> The /entity/ path is just redirect to Special:EntityData. You can
> access it easily this
> way http://structured-commons.wmflabs.org/wiki/Special:EntityData/M13.rdf
>
> I don't know how good is the RDF output, but given that it's mediainfo
> and doesn't have any additional parts (for example lexeme does) I
> think it's fine. Feel free to file a bug if you see any problems.
>
> Best
>
> On 6 July 2017 at 23:19, André Costa  > wrote:
>
> Nice!
>
> Will the connection back to the image be included in the rdf? The
> /entity/ path was not available so couldn't check what is there now.
>
> Cheers,
> André 
>
>
>
> On 6 Jul 2017 15:10, "Léa Lacroix"  > wrote:
>
> Hello all,
>
> As you may know, WMF, WMDE and volunteers are working together
> on the structured data for Commons
> 
> project. We’re currently working on a lot of technical
> groundwork for this project. One big part of that is allowing
> the use of Wikidata’s items and properties to describe media
> files on Commons. We call this feature federation. We have now
> developed the necessary code for it and you can try it out on
> a test system and give feedback.
>
> We have one test wiki that represents Commons
> (http://structured-commons.wmflabs.org
> ) and another one
> simulating Wikidata (http://federated-wikidata.wmflabs.org
> ). You can see an
> example
> 
> where the statements use items and properties from the faked
> Wikidata. Feel free to try it by adding statements to to some
> of the files on the test system. (You might need to create
> some items on http://federated-wikidata.wmflabs.org
>  if they don’t exist
> yet. We have created a few for testing.)
>
> If you have any questions or concern, please let us know.
> Thanks,
>
> -- 
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de 
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien
> Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts
> Berlin-Charlottenburg unter der Nummer 23855 Nz. Als
> gemeinnützig anerkannt durch das Finanzamt für Körperschaften
> I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
>
>
>
>
> -- 
> Amir Sarabadani Tafreshi
> Software Engineer (contractor) 
> -
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> http://wikimedia.de 
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
> Finan

Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Amir Tafreshi
Hey,
The /entity/ path is just redirect to Special:EntityData. You can access it
easily this way
http://structured-commons.wmflabs.org/wiki/Special:EntityData/M13.rdf

I don't know how good is the RDF output, but given that it's mediainfo and
doesn't have any additional parts (for example lexeme does) I think it's
fine. Feel free to file a bug if you see any problems.

Best

On 6 July 2017 at 23:19, André Costa  wrote:

> Nice!
>
> Will the connection back to the image be included in the rdf? The /entity/
> path was not available so couldn't check what is there now.
>
> Cheers,
> André
>
>
>
> On 6 Jul 2017 15:10, "Léa Lacroix"  wrote:
>
>> Hello all,
>>
>> As you may know, WMF, WMDE and volunteers are working together on the 
>> structured
>> data for Commons
>>  project.
>> We’re currently working on a lot of technical groundwork for this project.
>> One big part of that is allowing the use of Wikidata’s items and properties
>> to describe media files on Commons. We call this feature federation. We
>> have now developed the necessary code for it and you can try it out on a
>> test system and give feedback.
>>
>> We have one test wiki that represents Commons (
>> http://structured-commons.wmflabs.org) and another one simulating
>> Wikidata (http://federated-wikidata.wmflabs.org). You can see an example
>>  where the
>> statements use items and properties from the faked Wikidata. Feel free to
>> try it by adding statements to to some of the files on the test system.
>> (You might need to create some items on http://federated-wikidata.wmfl
>> abs.org if they don’t exist yet. We have created a few for testing.)
>> If you have any questions or concern, please let us know.
>> Thanks,
>>
>> --
>> Léa Lacroix
>> Project Manager Community Communication for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Amir Sarabadani Tafreshi
Software Engineer (contractor)
-
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread André Costa
Nice!

Will the connection back to the image be included in the rdf? The /entity/
path was not available so couldn't check what is there now.

Cheers,
André



On 6 Jul 2017 15:10, "Léa Lacroix"  wrote:

> Hello all,
>
> As you may know, WMF, WMDE and volunteers are working together on the 
> structured
> data for Commons
>  project.
> We’re currently working on a lot of technical groundwork for this project.
> One big part of that is allowing the use of Wikidata’s items and properties
> to describe media files on Commons. We call this feature federation. We
> have now developed the necessary code for it and you can try it out on a
> test system and give feedback.
>
> We have one test wiki that represents Commons (http://structured-commons.
> wmflabs.org) and another one simulating Wikidata (
> http://federated-wikidata.wmflabs.org). You can see an example
>  where the
> statements use items and properties from the faked Wikidata. Feel free to
> try it by adding statements to to some of the files on the test system.
> (You might need to create some items on http://federated-wikidata.
> wmflabs.org if they don’t exist yet. We have created a few for testing.)
> If you have any questions or concern, please let us know.
> Thanks,
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Antonin Delpeuch (lists)
On 06/07/2017 16:41, Lydia Pintscher wrote:
> I am not sure I understand what you mean exactly. Do you mean that
> when you are on the file page
> (http://structured-commons.wmflabs.org/wiki/File:LighthouseinDublin.jpg)
> you see the data from the data page
> (http://structured-commons.wmflabs.org/wiki/MediaInfo:M13)? If so then
> yes. They will be merged once we have a feature we call multi content
> revisions. That allows us to have structured data and wiki text in the
> same page.

Fantastic! Yes, this is exactly what I meant.

Antonin


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Lydia Pintscher
On Thu, Jul 6, 2017 at 4:36 PM, Antonin Delpeuch (lists)
 wrote:
> Awesome!
>
> I wonder if there are any plans to display Wikibase's statements on
> Commons' side? Currently all I can see is a "MediaInfo:M13" link which
> does not really showcase all the awesome data that is hidden behind it! :)

I am not sure I understand what you mean exactly. Do you mean that
when you are on the file page
(http://structured-commons.wmflabs.org/wiki/File:LighthouseinDublin.jpg)
you see the data from the data page
(http://structured-commons.wmflabs.org/wiki/MediaInfo:M13)? If so then
yes. They will be merged once we have a feature we call multi content
revisions. That allows us to have structured data and wiki text in the
same page.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Antonin Delpeuch (lists)
Awesome!

I wonder if there are any plans to display Wikibase's statements on
Commons' side? Currently all I can see is a "MediaInfo:M13" link which
does not really showcase all the awesome data that is hidden behind it! :)

Antonin

On 06/07/2017 15:10, Léa Lacroix wrote:
> Hello all,
> 
> As you may know, WMF, WMDE and volunteers are working together on the
> structured data for Commons
>  project.
> We’re currently working on a lot of technical groundwork for this
> project. One big part of that is allowing the use of Wikidata’s items
> and properties to describe media files on Commons. We call this feature
> federation. We have now developed the necessary code for it and you can
> try it out on a test system and give feedback.
> 
> We have one test wiki that represents Commons
> (http://structured-commons.wmflabs.org) and another one simulating
> Wikidata (http://federated-wikidata.wmflabs.org). You can see an example
>  where the
> statements use items and properties from the faked Wikidata. Feel free
> to try it by adding statements to to some of the files on the test
> system. (You might need to create some items on
> http://federated-wikidata.wmflabs.org if they don’t exist yet. We have
> created a few for testing.)
> 
> If you have any questions or concern, please let us know.
> Thanks,
> 
> -- 
> Léa Lacroix
> Project Manager Community Communication for Wikidata
> 
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de 
> 
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> 
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Magnus Manske
Fantastic news!

Now if you could set up a SPARQL instance for those two...

(the reward for doing good work is more work!)


On Thu, Jul 6, 2017 at 2:10 PM Léa Lacroix  wrote:

> Hello all,
>
> As you may know, WMF, WMDE and volunteers are working together on the 
> structured
> data for Commons
>  project.
> We’re currently working on a lot of technical groundwork for this project.
> One big part of that is allowing the use of Wikidata’s items and properties
> to describe media files on Commons. We call this feature federation. We
> have now developed the necessary code for it and you can try it out on a
> test system and give feedback.
>
> We have one test wiki that represents Commons (
> http://structured-commons.wmflabs.org) and another one simulating
> Wikidata (http://federated-wikidata.wmflabs.org). You can see an example
>  where the
> statements use items and properties from the faked Wikidata. Feel free to
> try it by adding statements to to some of the files on the test system.
> (You might need to create some items on
> http://federated-wikidata.wmflabs.org if they don’t exist yet. We have
> created a few for testing.)
> If you have any questions or concern, please let us know.
> Thanks,
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Lydia Pintscher
On Thu, Jul 6, 2017 at 3:51 PM, Thad Guidry  wrote:
>
> Hi Lea... a few questions,
>
> 1. Are there plans to have the MIME types auto extracted and linked ?
>
> (233 × 500 pixels, file size: 22 KB, MIME type: image/jpeg)

It is a bit too early for that. Potentially yes but there is a lot
more work to do before that.

> 2. Will the metadata also be searchable via SPARQL ?
>
> JPEG file commentcmp3.10.3.1Lq4 0xacc6f59a

Yes.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Thad Guidry
Hi Lea... a few questions,

1. Are there plans to have the MIME types auto extracted and linked ?

(233 × 500 pixels, file size: 22 KB, MIME type:* image/jpeg*)

2. Will the metadata also be searchable via SPARQL ?

JPEG file comment *cmp3.10.3.1Lq4 0xacc6f59a*

-Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] New step towards structured data for Commons is now available: federation

2017-07-06 Thread Léa Lacroix
Hello all,

As you may know, WMF, WMDE and volunteers are working together on the
structured
data for Commons
 project. We’re
currently working on a lot of technical groundwork for this project. One
big part of that is allowing the use of Wikidata’s items and properties to
describe media files on Commons. We call this feature federation. We have
now developed the necessary code for it and you can try it out on a test
system and give feedback.

We have one test wiki that represents Commons (
http://structured-commons.wmflabs.org) and another one simulating Wikidata (
http://federated-wikidata.wmflabs.org). You can see an example
 where the
statements use items and properties from the faked Wikidata. Feel free to
try it by adding statements to to some of the files on the test system.
(You might need to create some items on
http://federated-wikidata.wmflabs.org if they don’t exist yet. We have
created a few for testing.)
If you have any questions or concern, please let us know.
Thanks,

-- 
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata