Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-26 Thread Daniel Kinzler
Hi gnosygnu!

The JSON in the XML dumps is the raw contents of the storage backend. It can't
be changed retroactively, and re-encoding everything on the fly would be too
expensive. Also, the JSON embedded in the XML files is not officially supported
as a stable interface of Wikibase. The JSON format in the XML files can change
without notice, and you may encounter different representations even within the
same dump.

I recommend to use the JSON dumps, they contain our data in canonical form. To
avoid downloading redundant information, you can use one of the
wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't
contain the actual page content, just meta-data.

Caveat: there is currently no dump that contains the JSON of old revisions of
entities in canonical form. You can only get them individually from
Special:EntityData, e.g.


HTH
-- daniel

Am 26.11.2016 um 02:13 schrieb gnosygnu:
> Hi everyone. I have a question about the Wikidata xml dump, but I'm
> posting this question here, because it looks more related to Wikidata.
> 
> In short, it seems that the "pages-articles.xml" does not include the
> datatype property for snaks. For example, the xml dump does not list a
> datatype for Q38 (Italy) and P41 (flag image). In contrast, the json
> dump does list a datatype of "commonsMedia".
> 
> Can this datatype property be included in future xml dumps? The
> alternative would be to download two large and redundant dumps (xml
> and json) in order to reconstruct a Wikidata instance.
> 
> More information is provided below the break. Let me know if you need
> anything else.
> 
> Thanks.
> 
> 
> 
> Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag
> image). Notice that there is no "datatype" property
>   // 
> https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2
>   "mainsnak": {
> "snaktype": "value",
> "property": "P41",
> "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863",
> "datavalue": {
>   "value": "Flag of Italy.svg",
>   "type": "string"
> }
>   },
> 
> Meanwhile, the API and the JSON dump lists a datatype property of
> "commonsMedia":
>   // https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q38
>   // 
> https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2
>   "P41": [{
> "mainsnak": {
>   "snaktype": "value",
>   "property": "P41",
>   "datavalue": {
> "value": "Flag of Italy.svg",
> "type": "string"
>   },
>   "datatype": "commonsMedia"
> },
> 
> As far as I can tell, the Turtle (ttl) dump does not list a datatype
> property either, but this may be because I don't understand its
> format.
>   wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D .
>   wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement,
>   wikibase:BestRank ;
> wikibase:rank wikibase:NormalRank ;
> ps:P41 
> 
> ;
> pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ;
> pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc .
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Why is impossible to connect interwikis?

2016-11-26 Thread Ilario Valdelli

I would connect this article:

https://it.wikipedia.org/wiki/Lista_nera_(economia)

with this item:

https://www.wikidata.org/wiki/Q607466

but I receive an error from the Italian interwiki and a message that I 
don't have the rights to do it if I would update it directly in Wikidata.


It seems to me strange because I usually did it for other items.

The message itself is not so much clear.

Is there something that I can do to update this interwiki?

--
Ilario Valdelli
Wikimedia CH
Verein zur Förderung Freien Wissens
Association pour l’avancement des connaissances libre
Associazione per il sostegno alla conoscenza libera
Switzerland - 8008 Zürich
Tel: +41764821371
http://www.wikimedia.ch


---
Questa e-mail è stata controllata per individuare virus con Avast antivirus.
https://www.avast.com/antivirus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Why is impossible to connect interwikis?

2016-11-26 Thread Mbch331
It was already connected to another item. I just merged the two items. 
Now the interwiki is linked to Q607466.


Mbch331

Op 26-11-2016 om 20:02 schreef Ilario Valdelli:

I would connect this article:

https://it.wikipedia.org/wiki/Lista_nera_(economia)

with this item:

https://www.wikidata.org/wiki/Q607466

but I receive an error from the Italian interwiki and a message that I 
don't have the rights to do it if I would update it directly in Wikidata.


It seems to me strange because I usually did it for other items.

The message itself is not so much clear.

Is there something that I can do to update this interwiki?





---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages

2016-11-26 Thread Andrew Hall
Hi,

Thanks very much for the responses. They were very insightful. 

Daniel, I have a follow up question with using “Wikidata entities used in this 
page” found at, for example: 
https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=info
In the “Wikidata entities used in this page” section, are the entities used 
dependent on, for example, the logic of the templates through which they are 
referenced? If entities are listed in this section, are they for sure always 
coming from Wikidata?
Sometimes “other (statements)” is specified in the “Wikidata entities used in 
this page” section. Is it possible to determine what those statements are?

Thanks,
Andrew


> On Nov 23, 2016, at 2:33 PM, Andrew Hall  wrote:
> 
> Hi,
> 
> I’m a PhD student/researcher at the University of Minnesota who (along with 
> Max Klein and another grad student/researcher) has been interested in 
> understanding the extent to which Wikidata is used in (English, for now) 
> Wikipedia. 
> 
> There seems to be no easy way to determine Wikidata usage in Wikipedia pages 
> so I’ll describe two approaches we’ve considered as our best attempts at 
> solving this problem. I’ll also describe shortcomings of each approach. 
> 
> The first approach involves analyzing Wikipedia templates to look for 
> explicit references (i.e. “#property:P”) across all templates. 
> For a given template containing a certain property reference, we then assume 
> that the statement corresponding to the Wikidata property is used in all 
> Wikipedia pages that transclude that template. However, there are two clear 
> limitations to this approach:
> If we assume that the statement corresponding to the Wikidata property is 
> used in all Wikipedia pages that transclude that template, this results in a 
> sort of upper bound on the number of actual property usages in Wikipedia. 
> However, we have no sense of what the actual usage looks like since each 
> template has its own set of logic and, whether or not a given property would 
> get rendered in Wikipedia is dependent on that (sometimes quite complicated) 
> logic. A possible way to get a sense of usage would be to sample a small set 
> of random pages (that use templates using Wikidata)  and manually look up 
> whether or not the Wikidata statement for the given Wikidata item 
>  is exactly the same as that 
> rendered in the corresponding Wikipedia page. If it was, then we might assume 
> the property is being used. Of course, this is not a perfect approach since 
> it's possible that a Wikidata statement is used in Wikipedia but it is 
> formatted differently in Wikidata versus in Wikipedia (e.g. a date is 
> rendered using a different format).
> This approach does not account for Lua modules, which can be referenced from 
> within templates. The modules can (and sometimes do) contain code that 
> supplies Wikidata to Wikipedia pages that are transcluded by the given 
> templates containing the module references. Without understanding and 
> accounting for the logic in all Lua modules that use Wikidata, it does not 
> seem possible to actually know which Wikidata properties are being introduced 
> to Wikipedia pages through this method.
> 
> The second approach involves expanding (using the MediaWiki API, see 
> https://www.mediawiki.org/wiki/API:Expandtemplates 
> ) already transcluded 
> templates into HTML tables in two ways: 1) in the context of the appropriate 
> Wikipedia page and 2) out of context of the appropriate Wikipedia page (e.g. 
> in my own sandbox). It’s my understanding that if the Wikipedia page uses 
> Wikidata, then that Wikidata should show up in the expansion if the template 
> is expanded in the context of its page, and not when expanded elsewhere (e.g. 
> in my sandbox). We would then check to see if there is a difference between 
> the two expansions by html diff-ing. The difference between the two expanded 
> templates would presumably be due to Wikidata. Of course, there are 
> limitations to this approach as well:
> It's possible that a Wikipedia contributor manually entered in data (into a 
> transcluded template) that exactly matches data in Wikidata and thus, the 
> expansions would be the same across the diff-ing — Wikidata would not be 
> recognizable in this case. 
> Once we identify (through diff-ing) where Wikidata is being used in expanded 
> templates, it's not obvious what specific Wikidata properties/statements were 
> used. In other words, "linking" Wikidata to corresponding html (table) rows 
> in an expanded template seems challenging.
> 
> Any insight about how we can approach this problem would be greatly 
> appreciated!
> 
> Thanks,
> Andrew Hall

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-26 Thread gnosygnu
Hi Daniel,

Thanks for the quick and helpful reply. I was hoping that the XML
dumps could be changed, but I understand now that the JSON dumps are
the recommended format.

> To avoid downloading redundant information, you can use one of the
> wikidatawiki-20161120-stub-* dumps instead of the full page dumps

This is useful, but unfortunately it won't suffice. Wikidata also has
pages which are wikitext (for example,
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names). These
wikitext pages are in the XML dumps, but aren't in the stub dumps nor
the JSON dumps. I actually do use these Wikidata wikitext entries to
try to reproduce Wikidata in its entirety. So for now, it looks like
both XML dumps and JSON dumps will be required.

At any rate, thanks again for the excellent reply.


On Sat, Nov 26, 2016 at 12:25 PM, Daniel Kinzler
 wrote:
> Hi gnosygnu!
>
> The JSON in the XML dumps is the raw contents of the storage backend. It can't
> be changed retroactively, and re-encoding everything on the fly would be too
> expensive. Also, the JSON embedded in the XML files is not officially 
> supported
> as a stable interface of Wikibase. The JSON format in the XML files can change
> without notice, and you may encounter different representations even within 
> the
> same dump.
>
> I recommend to use the JSON dumps, they contain our data in canonical form. To
> avoid downloading redundant information, you can use one of the
> wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't
> contain the actual page content, just meta-data.
>
> Caveat: there is currently no dump that contains the JSON of old revisions of
> entities in canonical form. You can only get them individually from
> Special:EntityData, e.g.
> 
>
> HTH
> -- daniel
>
> Am 26.11.2016 um 02:13 schrieb gnosygnu:
>> Hi everyone. I have a question about the Wikidata xml dump, but I'm
>> posting this question here, because it looks more related to Wikidata.
>>
>> In short, it seems that the "pages-articles.xml" does not include the
>> datatype property for snaks. For example, the xml dump does not list a
>> datatype for Q38 (Italy) and P41 (flag image). In contrast, the json
>> dump does list a datatype of "commonsMedia".
>>
>> Can this datatype property be included in future xml dumps? The
>> alternative would be to download two large and redundant dumps (xml
>> and json) in order to reconstruct a Wikidata instance.
>>
>> More information is provided below the break. Let me know if you need
>> anything else.
>>
>> Thanks.
>>
>> 
>>
>> Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag
>> image). Notice that there is no "datatype" property
>>   // 
>> https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2
>>   "mainsnak": {
>> "snaktype": "value",
>> "property": "P41",
>> "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863",
>> "datavalue": {
>>   "value": "Flag of Italy.svg",
>>   "type": "string"
>> }
>>   },
>>
>> Meanwhile, the API and the JSON dump lists a datatype property of
>> "commonsMedia":
>>   // https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q38
>>   // 
>> https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2
>>   "P41": [{
>> "mainsnak": {
>>   "snaktype": "value",
>>   "property": "P41",
>>   "datavalue": {
>> "value": "Flag of Italy.svg",
>> "type": "string"
>>   },
>>   "datatype": "commonsMedia"
>> },
>>
>> As far as I can tell, the Turtle (ttl) dump does not list a datatype
>> property either, but this may be because I don't understand its
>> format.
>>   wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D .
>>   wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement,
>>   wikibase:BestRank ;
>> wikibase:rank wikibase:NormalRank ;
>> ps:P41 
>> 
>> ;
>> pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ;
>> pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc .
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata