[Wikidata] Re: Inconsistencies on WDQS data - data reload on WDQS

2023-02-23 Thread Peter F. Patel-Schneider

On 2/23/23 12:19, James Heald wrote:

On Wed, 22 Feb 2023 at 00:03, Kingsley Idehen via Wikidata  wrote:


On 2/21/23 4:05 PM, Guillaume Lederrey wrote:




The exposed SPARQL endpoint is at the moment a direct exposition of the 
Blazegraph endpoint, so it does expose all the Blazegraph specific features 
and quirks.



Is there a Query Service that's separated from the Blazegraph endpoint? The 
crux of the matter here is that WDQS benefits more by being loosely- bound 
to endpoints rather than tightly-bound to the Blazegraph endpoint.





What we would like to do at some point (this is not more than a rough idea 
at this point) is to add a proxy in front of the SPARQL endpoint, that 
would filter specific SPARQL features, so that we limit what is available 
to a standard set of features available across most potential backends. 
This would help reduce the coupling of queries with the backend. Of course, 
this would have the drawback of limiting the feature set.





I have to say I am a bit concerned by this talk, since some of Blazegraph's 
"features and quirks" can be exceedingly useful.


I agree that some of Blazegraph's extensions to SPARQL are useful, 
particularly for me the ability to easily access Wikidata labels in my language.


But Blazegraph appears to be unmaintained. The team that developed Blazegraph 
does not appear to be in a situation that they can help in fixing problems in 
Blazegraph and no one else appears to be interested in fixing problems in it. 
Errors and other issues with Blazegraph are negatively affecting the WDQS. 
That's not a good state of affairs.


In my opinion the WDQS should be trying to get off Blazegraph.

peter
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/MFBUD4RL6LEHNJWSOHYRRMIA7YJC2BZJ/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Wiikidata class hiearchy Re: Re: Challenge of the day: sports season without sports

2022-12-23 Thread Peter F. Patel-Schneider
It may be that Wikidata has a lot of general classes, but this is unavoidable 
I think if Wikidata is going to store a lot of different kinds of 
information.  (This is not to say that there are not problems in the Wikidata 
class hierarchy.)


For example one of the objects that the mayor of Madison is related to is the 
United States of America.   There are 82 different classes that can be reached 
by following instance of and then zero or more subclass links.  But there are 
8 different classes that the United States of America is a direct instance 
of.  If there are any classes that do not belong in the 82 then it is not 
because they are general classes (object, entity) it is because there are 
suspect generalization links.  For example, how does the United States of 
America get to be a set?  Or a geographical feature?



There are a few classes that are suspect in this list, particularly three SOMA 
classes.  I don't see why Wiikidata should have SOMA classes that should just 
mirror regular Wikidata classes.



peter




On 12/23/22 18:47, Erik Paulson wrote:
The Wikidata subclass of ontology/taxonomy is unfortunately a lot of 
true-but-unhelpful info if you do some inferencing.


Subclassing in particular is not very useful. As an example, let's take the 
Mayor of Madison, WI - for any property we say about her, what are the 
classes and superclasses of the target of that, e.g. we say that she's a 
Mayor, what classes are 'Mayor' a member of?


SELECT distinct ?p ?pLabel ?item ?itemLabel ?itemClass ?itemClassLabel
WHERE
{
  wd:Q63039729 ?p ?item. # Q63039729 is the current mayor of madison
  OPTIONAL {?item wdt:P31/wdt:P279* ?itemClass . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then 
en language

} order by (?item)

This query comes back with 562 classes, most of which are upper-ontology 
statements that are really not very useful at best and in some cases almost 
feel like they're so irrelevant as to make the whole exercise pointless - 
for example,


  * via some subclass path in wikidata, we learn that 'United States of
America' is a member of the classes 'agent', 'matter', 'set', and
'astronomical body part'.
  * The Mayor's first name is 'Satya', which is a member of the classes
'data', 'information', 'series', 'non-physical entity', 'multiset', and
a 'mathematical object', among others.
  * She was educated at 'Smith College', which (I'm happy to learn from
wikidata) is a member of the class '3 dimensional objects'.
  * The mayor is the first gay mayor of Madison, and it turns out according
to wikidata 'lesbianism' is a member of the class 'occurence',
'spatio-temporal entity', and more.

So while I guess an auxiliary database with inferences from wikidata would 
be neat, I think it'd be a lot of noise and I'm not sure all that useful in 
practice.


I do wish that there was some support in wikibase to know more about classes 
and instances - so if you add a P31 instanceOf property to an item or a P279 
subClassOf to a class, Wikibase tells you "BTW, you're also saying that your 
thing is also an instance of all of these classes/your class is now a 
subclass of all these other classes too"


-Erik

___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/F24AYPI52RL2RURT4NMULY3P5KR6NMXI/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Challenge of the day: sports season without sports

2022-12-23 Thread Peter F. Patel-Schneider
Is Wikidata again being held back by the SPARQL endpoint?   I thought that 
this had been resolved some time ago.



But are these triples redundant?  They might be redundant in the sense that 
someone who knows how sports teams, leagues, and seasons are put together 
would know where to look for the sport.  But this is putting a heavy bar on 
consumers of Wikidata, which in my opinion is not what Wikidata should be 
doing.  Instead Wikidata should be easy to use.  And it is entirely possible 
that these triples would not be redundant in all cases.



peter





On 12/23/22 16:45, Dan Brickley wrote:



On Fri, 23 Dec 2022 at 21:32, Peter F. Patel-Schneider 
 wrote:


Why should this not be done?   It seems reasonable to me.   Is there
some official statement that this should not be done?


The Blazegraph Wikidata SPARQL endpoint (as it is a sadly abandoned 
codebase) already creaking at the seams, and struggling to keep up with the 
happily thriving growth of Wikidata.


In this situation it seems that adding redundant factoids into the database 
might not be the best use of constrained resources, for now at least.


More generally, where is any notion of inference in Wikidata defined?


This is a good question, I’d love to know too. Maybe mapping the equivalent 
P:Whatevers to rdfs:subClassOf and rdf:type would be a start?


If they were in a form from which we could generate even just SPARQL 
CONSTRUCT queries, and perhaps populate an auxiliary dataset/database with 
the additional implied information.


Dan

There appear to be more problems with sports season.   For example,
https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to
any league or cup.




peter


On 12/23/22 15:15, Thad Guidry wrote:


Please do not do this,
What you are likely wanting to accomplish is relating a sports season
to a category of sports and this is already done.  So the relationships
are inferred.

On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki 
wrote:

Hi all,

Too many items on Wikidata still miss the basic statements. Perhaps
we can focus together for a short period of time on a single
subject to get this fixed.

For example: all items with instance of (P31) sports season should
also have have sport (P641) as statement.

When I just ran a query I saw 24000 sports season items that are
still missing sport (P641).

Query:

https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP641%20%3Fmissing%20.%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0A

I already did a few myself but for the largest part help is needed.
Who has ideas and can help getting this statement added to all the
sports season items.

Thanks!

Romaine

___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GST2SPL43JUGN74242BWRS7CYDLFMDCL/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

-- 
Thad

https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/

___
Wikidata mailing list --wikidata@lists.wikimedia.org
Public archives 
athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/ERKXD6D3673T6GA5NGBAYTAW44DDHTPY/
To unsubscribe send an email towikidata-le...@lists.wikimedia.org

___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/OD5VGSCX2JF6QXXLBDMJCEZMN6OPOFY4/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


___
Wikidata mailing list --wikidata@lists.wikimedia.org
Public archives 
athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/VEE75U26Y7UR2WA5XQHYXLMB7CWEMMOJ/
To unsubscribe send an email towikidata-le...@lists.wikimedia.org___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/25YIBQY5S4I6CZUNBPD5OXCRI6B2EQ7Z/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Challenge of the day: sports season without sports

2022-12-23 Thread Peter F. Patel-Schneider
Why should this not be done?   It seems reasonable to me.   Is there some 
official statement that this should not be done?


More generally, where is any notion of inference in Wikidata defined?


There appear to be more problems with sports season.   For example, 
https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any 
league or cup.





peter


On 12/23/22 15:15, Thad Guidry wrote:


Please do not do this,
What you are likely wanting to accomplish is relating a sports season to a 
category of sports and this is already done.  So the relationships are inferred.


On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki  wrote:

Hi all,

Too many items on Wikidata still miss the basic statements. Perhaps we
can focus together for a short period of time on a single subject to get
this fixed.

For example: all items with instance of (P31) sports season should also
have have sport (P641) as statement.

When I just ran a query I saw 24000 sports season items that are still
missing sport (P641).

Query:

https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP641%20%3Fmissing%20.%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0A

I already did a few myself but for the largest part help is needed. Who
has ideas and can help getting this statement added to all the sports
season items.

Thanks!

Romaine

___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GST2SPL43JUGN74242BWRS7CYDLFMDCL/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

--
Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/

___
Wikidata mailing list --wikidata@lists.wikimedia.org
Public archives 
athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/ERKXD6D3673T6GA5NGBAYTAW44DDHTPY/
To unsubscribe send an email towikidata-le...@lists.wikimedia.org___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/OD5VGSCX2JF6QXXLBDMJCEZMN6OPOFY4/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: State of the (Wiki)data

2022-11-01 Thread Peter F. Patel-Schneider
I agree with all these criticisms of the information in Wikidata. There are  
quite a few important classes in Wikidata where there are missing, 
questionable, or incorrect structural data.  Look at colors (instances of 
Q1075), where some colors are both instances and subclasses of color; or ships 
(instances of Q11446), where some ships are subclasses of ship; or the 
superclasses of geographic region (Q82794), which include set; or the 
instances of woman (Q467), of which there are only 28.


I believe that these structural problems in Wikidata are a major, probably the 
major, reason that Wikidata does not have considerably more uptake than it 
currently does.  Certainly every time I think of using Wikidata I have to 
think hard about what I need to do to ensure that the structural problems in 
Wikidata will not pose too much of a problem for my use.  (In most cases I 
come to the reluctant conclusion that they will.)



It's not so much that there are examples of bad structural data, it is that 
examples are so easy to find.  And it's not so much that the problems arise 
from bad policies, it is that there are no enforced policies.  And it's even 
not so much that these are unknown problems as most of them have been 
previously reported.


It is for the above reasons that I believe that lack of tool support is not 
the major driver of the problems, and certainly tools that can only point out 
problems are not going to be a significant help in solving the problems.  
Instead I believe that what is driving the structural problems with Wikidata 
is that there is insufficient effort paid by the Wikidata community to 
identify and implement fixes for the structural problems.  Tool support is 
important, I agree, but without people in the Wikidata community putting a 
higher priority on fixing data in Wikidata than even adding more data to 
Wikidata structural problems will continue.


I also feel that it does very little good to ask people who are adding new 
data to Wikidata to only create data with good structure when there are so may 
existing problems.  Instead the existing problems first need to be fixed up.  
This will both show that the Wikidata community cares about good structure and 
show people who are adding new data how new data should be added instead of 
the current situation which in too many cases provides examples of how not to 
structure data.  Consider a tool that retrieves items that are similar to an 
item being added.  If this comparison item has bad structuring nearby it is 
very likely that the new item will be either given similar or linked to the 
existing bad structuring.




As far as labels, descriptions, and aliases go I agree that the current 
situation is poor.  But what I believe is missing most is enough description 
that the intent of an item, particularly a class, can be correctly 
determined.  I often end up with only a poor idea of what items should be an 
instance of a class, particularly when considering several classes at once.  
The various geographic classes are a prime example here for me.  In my view 
many of the natural language information associated with Wikidata items should 
be tagged with the English Wikipedia multiple issues template.




Queries that show the above problems:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q1075.
  ?item wdt:P279* wd:Q1075.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q11446.
  ?item wdt:P279* wd:Q11446.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }

SELECT ?item ?itemLabel WHERE {
  wd:Q82794 wdt:P279* ?item .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}


SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31/wdt:P279* wd:Q467.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}


Peter F. Patel-Schneider

___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GERAOWK3O56Z2YY4KHGZO4IGCXXXZK32/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


Re: [Wikidata-tech] Blank node deprecation in WDQS & Wikibase RDF model

2020-04-16 Thread Peter F. Patel-Schneider
I am taking the liberty of replying to the list because of the problems with
supplied justification for this change that are part of the original message.

I believe that https://phabricator.wikimedia.org/T244341#5889997 is inadequate
for determining that blank nodes are problematic.  First, the fact that
determining isomorphism in RDF graphs with blank nodes is non-polynomial is a
red herring.  If the blank nodes participate in only one triple then
isomorphism remains easy.  Second, the query given to remove a some-value SNAK
is incorrect in general - it will remove all triples with the blank node as
object.  (Yes, if the blank nodes found are leaves then no extra triples are
removed.)  A simpler DELETE WHERE will have the seemingly-desired result.

This is not to say that blank nodes do not cause problems.  According to the
semanticss of both RDF and SPARQL blank nodes are anonymous so to repeatedly
access the same blank node in a graph one has to access the stored graph using
an interface that exposes the retained identity of blank nodes.  It looks as
if the WDSQ is built on a system that has such an interface.  As the WDQS
already uses user-visible features that are not part of SPARQL, adding (or
maybe even only utilizing) a non-standard interface that is only used
internally would not be a problem.

One problem when using generated URLs to replace blank nodes is that these
generated URLs have to be guaranteed stable and unique (not just stable) for
the lifetime of the query service.  Another problem is that yet another
non-standard function is being introduced, pulling the RDF dump of Wikidata
yet further from RDF.

So this is a significant change as far as users are concerned that also has
potential implementation issues.   Why not just use an internal interface that
exposes a retained identity for blank nodes?

Peter F. Patel-Schneider



On 4/16/20 8:34 AM, David Causse wrote:
> Hi,
>
> This message is relevant for people writing SPARQL queries and using the
> Wikidata Query Service:
>
> As part of the work of redesigning the WDQS updater[0] we identified that
> blank nodes[1] are problematic[2] and we plan to deprecate their usage in
> the wikibase RDF model[3]. To ease the deprecation process we are
> introducing the new function wikibase:isSomeValue() that can be used in
> place of isBlank() when it was used to filter SomeValue[4].
>
> What does this mean for you: nothing will change for now, we are only
> interested to know if you encounter any issues with the
> wikibase:isSomeValue() function when used as a replacement of the isBlank()
> function. More importantly, if you used the isBlank() function for other
> purposes than identifying SomeValue (unknown values in the UI), please let
> us know as soon as possible.
>
> The current plan is as follow:
>
> 1. Introduce a new wikibase:isSomeValue() function
> We are at this step. You can already use wikibase:isSomeValue() in the Query
> Service. Here’s an example query (Humans whose gender we know we don't know):
> SELECT ?human WHERE {
> ?human wdt:P21 ?gender
> FILTER wikibase:isSomeValue(?gender) .
> }
> You can also search the wikis[8] to find all the pages where the function
> isBlank is referenced in a SPARQL query.
>
> 2. Generate stable labels for blank nodes in the wikibase RDF output
> Instead of "autogenerated" blank node labels wikidata will now provide a
> stable label for blank nodes. In other words the wikibase triples using
> blank nodes such as:
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576 _:genid2 ;
> will become
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576
> _:1668ace9a6860f7b32569c45fe5a5c0d ;
> This is not a breaking change.
>
> 3. [BREAKING CHANGE] Convert blank nodes to IRIs in the WDQS updater
> At this point some WDQS servers will start returning IRIs such
> as http://www.wikidata.org/somevalue/1668ace9a6860f7b32569c45fe5a5c0d (the
> exact form of the IRI is still under discussion) instead of blank node
> literals like t1514691780 auto-generated by blazegraph. Queries still using
> isBlank() will stop functioning. Tools explicitly relying on the presence of
> blank nodes (t1514691780) in the query results will also be affected.
> We don’t have a defined date for this change yet, but we will follow the
> Wikidata breaking change process (announcing the change 4 weeks in advance).
>
> 4. [BREAKING CHANGE] Change the RDF model and remove blank nodes completely
> from the RDF dumps
> Instead of doing the conversion and blank node removal in the WDQS updater
> we will do it at RDF generation.
> This is a breaking change of the somevalue section of the RDF model[5] and
> the no value owl constraint for properties[6].
> We don’t have a defined date for this change yet, but we will follow the
> Wikidata breaking change

Re: [Wikidata-tech] how to load a dump of wikidata into a local wikibase

2019-11-19 Thread Peter F. Patel-Schneider
Is this the recommended way to set up a local copy of Wikidata?  (If not, what
is the recommended way?)

peter


On 11/19/19 10:37 AM, Addshore wrote:
> Hi all
>
> We resolved this on the Wikibase telegram chat.
>
> For anyone finding this email thread, here is a rough log of the chat
>
> ---
>
> Miquel Farre
> I just wrote to the wikidata-tech mail list asking for the question
> regarding the load of dumps https://t.me/c/1478172663/1845
> if somebody here is professionally consulting around wikidata/wikibase and
> would be able to help writing fixes to those problems, we could study how to
> sponsor it. Please let me know.
>
> Addshore
> I just scrolled up and it looks like it is a namespace issue
> Wikidata has namespace 0 (the main namespace) being wikidaa items,.
> On a default install of wikibase the item namespace is 120, and 0 is a
> wikitext namespace
> Either you'll have to configure your wikibase to be the same as Wikidata, or
> run something through the dump to change the namespace.
> I'm not sure there is anything built into the import process for XML for
> converting namespaces, I'll have a quick look, could be a good feature thoigh
>
> Miquel Farre
> regarding the namespace issue, can I just add this config:
> https://www.mediawiki.org/wiki/Wikibase/Installation/Advanced_configuration#Items_in_a_dedicated_item_namespace
> chaning $baseNs = 0 ?
>
> Addshore
> Yes, so if you look in the example settings you can find the rather verbose
> example of how to do this
> https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/config/Wikibase.example.php#L28-L44
> defining $baseNs in local settings will not work, but defining those other
> things after you load the default settings will work
> changing this on an already existing wikibase might have unexpected
> consequences, and is not advised, but doing this on an empty one should be
> fine :)
>
> Miquel Farre
> Adam, this is working, I can load a dump 
>
> On Tue, 19 Nov 2019 at 09:39, Miquel Àngel Farré  > wrote:
>
> Hello,
>
> We are having issues launching a local copy of wikidata, when we use the
> 'importDump.php' tool, below the issues that we are facing.
> If somebody has an idea of how we could solve this, please let me know.
> We are also considering professional services to get fixes for this
> being released in case somebody is professionally consulting around
> wikibase.
>
> Thanks,
> Miquel
>
> Here the issues:
> if I try to load the full dump, the error I get is:
> root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php  --conf
> ../LocalSettings.php
>  ../images/wikidatawiki-20191101-pages-articles-multistream.xml.bz2
> Warning: XMLReader::read():
> uploadsource://d0cd78c216b067ffdd60946c258db6a7:45: parser error : Extra
> content at the end of the document in
> /var/www/html/includes/import/WikiImporter.php on line 646
> Warning: XMLReader::read():    in
> /var/www/html/includes/import/WikiImporter.php on line 646
> Warning: XMLReader::read():              ^ in
> /var/www/html/includes/import/WikiImporter.php on line 646
> Done!
> You might want to run rebuildrecentchanges.php to regenerate 
> RecentChanges,
>
> If I try to load a partial dump, the warnings that I get (which I think
> those mean nothing is loading) are:
> root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php  --conf
> ../LocalSettings.php
>  ../images/wikidatawiki-20191020-pages-meta-current1.xml-p1p235321.bz2
> Revision 1033865598 using content model wikibase-item cannot be stored
> on "Q15" on this wiki, since that model is not supported on that page.
> Revision 1034542603 using content model wikibase-item cannot be stored
> on "Q17" on this wiki, since that model is not supported on that page.
> Revision 1032554298 using content model wikibase-item cannot be stored
> on "Q18" on this wiki, since that model is not supported on that page.
> Revision 1032534215 using content model wikibase-item cannot be stored
> on "Q20" on this wiki, since that model is not supported on that page.
> Revision 1026713626 using content model wikibase-item cannot be stored
> on "Q21" on this wiki, since that model is not supported on that page.
> Revision 1023703278 using content model wikibase-item cannot be stored
> on "Q22" on this wiki, since that model is not supported on that page.
> Revision 1032815802 using content model wikibase-item cannot be stored
> on "Q25" on this wiki, since that model is not supported on that page.
> Revision 1032910600 using content model wikibase-item cannot be stored
> on "Q26" on this wiki, since that model is not supported on that page.
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org 
> 

Re: [Wikidata] Shape Expressions arrive on Wikidata on May 28th

2019-05-30 Thread Peter F. Patel-Schneider
The history of ShEx is quite complex.

I don't think that one can say that there were complete and conforming
implementations of ShEx in 2017 because the main ShEX specification,
http://shex.io/shex-semantics-20170713/ was ill-founded.  I pointed this out
in https://lists.w3.org/Archives/Public/public-shex/2018Mar/0008.html

There were several quite different semantics proposed for ShEx somewhat
earlier, all with significant problems.

peter





On 5/30/19 12:34 AM, Andra Waagmeester wrote:
> I really don't see the issue here. SHACL, like ShEx is a language to express
> data shapes. I adopted using ShEx in a wikidata context 2016 when ShEx was
> demonstrated at a tutorial at the SWAT4HCLS conference [1] in Amsterdam, where
> it was discussed in both a tutorial and a hackathon topic. At that conferene,
> I was convinced that ShEx is helpful in maintaining quality in Wikidata. ShEx
> offers not only the means to validate data shapes in Wikidata, but it also
> provides a way to document how primary data is expressed in Wikidata.  In 2016
> I joined the ShEx community group [2]. Since I have been actively using ShEx
> in defining shapes in various projects on Wikidata (e.g. Gene Wiki and
> Wikicite).  It is not that this happened in secrecy. On the contrary, it was
> discussed at both Wikimedia [3,4] and non-Wikimedia events [5,6,7].
> 
> It is also not the case that SHACL has not been discussed in this context, on
> the contrary, I have very good memories of a workshop where both were debated
> (see page 24 ;) )  [8]
> 
> IMHO  the statement that we all should adhere to one standard, simply because
> it is a standard, is not a valid argument. Imagine having to dictate that we
> all should speak English because it is the standard language.  In every single
> talk that I have given since 2016, proponents of SHACL have been very vocal in
> asking the same question over and over again "why not SHACL?", where the
> discussion never went beyond, "You should because it is a standard". It is
> also a bit disingenuous to suggest we all should adhere to SHACL because it is
> the standard, while in the same sentence calling it a "Recommendation". 
> 
> Although initially, I was open to SHACL as well (I use both Mac and Linux, so
> why not open up to different alternatives in data shapes), (Some) Arguments
> for me to prefer ShEx over SHACL are:
> 1. Already in 2017 there were different (open) implementations. At the time
> SHACL didn't have much tooling to choose from, other than one javascript
> implementation and a proprietary software package. 
> 2. ShEx has a more intuitive way of describing Shapes, which is the compact
> syntax (ShExC). SHACL seems to have adopted the compact syntax as well, but
> only yesterday [9].
> 3. The culture in the Shape Expression community group aligns well with the
> culture in Wikidata. 
> 4. I don't want to be shackled to one standard (pun intended). I assume the
> name was chosen with a shackle in mind, which puts constraints at the core of
> the language. Wikidata already has different methods in place to deal with
> constraints and constraint violations. In the context of Wikidata, ShEx should
> specifically not be intended to impose constraints, on the contrary, it allows
> expressing of disagreement or variants of different shapes, whether conflict
> or not. Which fits well with the NPOV concept. Symbols do matter. 
> 
> For a less personal comparison, I refer to the "Validating RDF data" book
> which describes both ShEx and SHACL, and has a specific chapter on how they
> compare and differ [10]
> 
> Up until now, I have been using ShEx in repositories outside the Wikidata
> ecosystem (e.g. Github), but I am really excited about the release of this
> extension. I am curious about how the wiki extension will influence the
> maintenance of schemas. Schemas are currently often expressed as static
> images, while in practice the schemas are as fluid as the underlying data
> itself. Being able to document these changes dynamically (the wiki way), can
> be very interesting. One specific expectation I have is that it might make it
> easier to write federated SPARQL queries. Currently, when writing these
> federated queries we often have to rely on either a set of example queries or
> a one-time schema description, which makes it hard to write those queries,
> because of schemas changing constantly. Federated SPARQL queries now really is
> a process of "slot machine" querying, where one has to explore the underlying
> schema, query by query. With a wiki in place and a  community maintaining
> these ever-changing schema's, I expect better documentation.
> 
> The data shape community, instead of adhering to one language, should really
> be proud to have produced two very helpful languages. ShEx and SHACL are
> similar but do have differences so both have merit to exist and I wish we
> could steer away from this ShEx vs SHACL feud. It really isn't helping the
> cause, i.e. being able to 

Re: [Wikidata] Shape Expressions arrive on Wikidata on May 28th

2019-05-29 Thread Peter F. Patel-Schneider
It is not really possible to determine what a reasonable shape is before
determining which Wikidata items are considered to be instances of human.  For
example, bog body (Q199414) is a subclass of human (Q5) but its instances are
quite different from other instances human.

In any case, shouldn't some proponent of this addition to Wikidata be
producing examples of reasonable shapes?  I could propose reasonable
constraints for instances of human, but I would do so in a formalism that I
much prefer.  Someone could, of course, translate these into ShEx, assuming
that ShEx could represent the constraints (which I'm not sure of at all).

To see what the differences (and difficulties) are consider a very reasonable
constraint - all the relatives of humans are humans (in my preferred syntax
human <= all relative human).  This *should* put a requirement on fathers,
mothers, children, etc. of humans as these are all sub-properties of relative.
 Is this going to work in ShEx?  I think that the answer is that it depends on
what RDF graph ShEX is going to run over.

peter


On 5/28/19 4:47 PM, Andra Waagmeester wrote:
> The schemas can strike a practical balance between capturing current practice
> and describing a todo list of things to fix on current practice. It's possible
> we will want to separate those roles. In the meantime, can you survey existing
> instances and propose a shape which is not too far from the deployed 
> instances?
> 
> On Tue, May 28, 2019 at 10:13 PM Peter F. Patel-Schneider
> mailto:pfpschnei...@gmail.com>> wrote:
> 
> I sure hope that E10 is *not* the shape for human.  It certainly isn't a
> correct shape for humans that belong to subclasses of human (such as  Old
> Croghan Man (Q166790) or Delina Filkins (Q1408186)).  E10 is also 
> currently
> silent on what information should be present for humans, which I take it 
> to be
> the point of having ShEx in Wikidata.
> 
> It is also unclear what is means to be the shape for human.  The shape 
> E10-
> does not have any information on which items are to be considered against 
> the
> shape.  Are all items in Wikidata to be considered (as in the definition 
> of
> ShEx)?  That doesn't seem right.  Are all direct instances of human?  That
> seems to limiting.  Are all indirect instances of human?  This seems the 
> most
> natural, but where is this behaviour given?
> 
> Peter F. Patel-Schneider
> Samsung Research America
> 
> 
> 
> On 5/28/19 12:04 PM, Léa Lacroix wrote:
> > Hello all,
> >
> > As previously announced, we just released shape expressions on 
> Wikidata. You
> > can for example have a look at E10, the shape for human
> > <https://www.wikidata.org/wiki/EntitySchema:E10>, or create a new
> EntitySchema
> > <https://www.wikidata.org/wiki/Special:NewEntitySchema>.
> >
> > A few useful links:
> >
> >   * WikiProject ShEx
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>
> >   * introduction to ShEx <http://shex.io/shex-primer/>
> >   * more details about the language <http://shex.io/shex-semantics/>
> >   * More information about how to create a Schema
> >   
>  
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx/How_to_get_started%3F>
> >   * Phabricator tag: shape-expressions
> >     <https://phabricator.wikimedia.org/tag/shape_expressions/>
> >   * User script
> >   
>  <https://www.wikidata.org/wiki/User:Zvpunry/EntitySchemaHighlighter.js> 
> to
> >     highlight items and properties in the schema code and turn the IDs
> into links
> >
> > If you have any question or encounter issues, feel free to ping me. 
> Cheers,
> >
> > Léa
> >
> >
> > On Sun, 19 May 2019 at 15:32, Léa Lacroix  <mailto:lea.lacr...@wikimedia.de>
> > <mailto:lea.lacr...@wikimedia.de <mailto:lea.lacr...@wikimedia.de>>> 
> wrote:
> >
> >     Hello all,
> >
> >     After several months of development and testing together with the
> >     WikiProject ShEx
> >     <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>, Shape
> >     Expressions are about to be enabled on Wikidata.
> >
> >     *First of all, what are Shape Expressions?*
> >
> >     ShEx (Q29377880) <https://www.wikidata.org/wiki/Q29377880> is a 
> concise,
> >     formal modeling and validation language for RDF structures. Shape
> >     Expressions can be used to define shapes within the 

Re: [Wikidata] Shape Expressions arrive on Wikidata on May 28th

2019-05-28 Thread Peter F. Patel-Schneider
I sure hope that E10 is *not* the shape for human.  It certainly isn't a
correct shape for humans that belong to subclasses of human (such as  Old
Croghan Man (Q166790) or Delina Filkins (Q1408186)).  E10 is also currently
silent on what information should be present for humans, which I take it to be
the point of having ShEx in Wikidata.

It is also unclear what is means to be the shape for human.  The shape E10-
does not have any information on which items are to be considered against the
shape.  Are all items in Wikidata to be considered (as in the definition of
ShEx)?  That doesn't seem right.  Are all direct instances of human?  That
seems to limiting.  Are all indirect instances of human?  This seems the most
natural, but where is this behaviour given?

Peter F. Patel-Schneider
Samsung Research America



On 5/28/19 12:04 PM, Léa Lacroix wrote:
> Hello all,
> 
> As previously announced, we just released shape expressions on Wikidata. You
> can for example have a look at E10, the shape for human
> <https://www.wikidata.org/wiki/EntitySchema:E10>, or create a new EntitySchema
> <https://www.wikidata.org/wiki/Special:NewEntitySchema>.
> 
> A few useful links:
> 
>   * WikiProject ShEx <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>
>   * introduction to ShEx <http://shex.io/shex-primer/>
>   * more details about the language <http://shex.io/shex-semantics/>
>   * More information about how to create a Schema
> 
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx/How_to_get_started%3F>
>   * Phabricator tag: shape-expressions
> <https://phabricator.wikimedia.org/tag/shape_expressions/>
>   * User script
> <https://www.wikidata.org/wiki/User:Zvpunry/EntitySchemaHighlighter.js> to
> highlight items and properties in the schema code and turn the IDs into 
> links
> 
> If you have any question or encounter issues, feel free to ping me. Cheers,
> 
> Léa
> 
> 
> On Sun, 19 May 2019 at 15:32, Léa Lacroix  <mailto:lea.lacr...@wikimedia.de>> wrote:
> 
> Hello all,
> 
> After several months of development and testing together with the
> WikiProject ShEx
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>, Shape
> Expressions are about to be enabled on Wikidata.
> 
> *First of all, what are Shape Expressions?*
> 
> ShEx (Q29377880) <https://www.wikidata.org/wiki/Q29377880> is a concise,
> formal modeling and validation language for RDF structures. Shape
> Expressions can be used to define shapes within the RDF graph. In the case
> of Wikidata, this would be sets of properties, qualifiers and references
> that describe the domain being modeled.
> 
> See also:
> 
>   * a short video about ShEx <https://www.youtube.com/watch?v=AR75KhEoRKg>
> made by community members during the Wikimedia hackathon 2019
>   * introduction to ShEx <http://shex.io/shex-primer/>
>   * more details about the language <http://shex.io/shex-semantics/>
> 
> *What can it be used for?*
> 
> On Wikidata, the main goal of Shape Expressions would be to describe what
> the basic structure of an item would be. For example, for a human, we
> probably want to have a date of birth, a place of birth, and many other
> important statements. But we would also like to make sure that if a
> statement with the property “children” exists, the value(s) of this
> property should be humans as well. Schemas will describe in detail what is
> expected in the structure of items, statements and values of these
> statements.
> 
> Once Schemas are created for various types of items, it is possible to
> test some existing items against the Schema, and highlight possible errors
> or lack of information. Subsets of the Wikidata graph can be tested to see
> whether or not they conform to a specific shape through the use of
> validation tools. Therefore, Schemas will be very useful to help the
> editors improving the data quality. We imagine this to be especially
> useful for wiki projects to more easily discuss and ensure the modeling of
> items in their domain. In the spirit of Wikidata not restricting the
> world, Shape Expressions are a tool to highlight, not prevent, errors.
> 
> On top of this, one could imagine other uses of Schemas in the future, for
> example building a tool that would suggest, when creating a new item, what
> would be the basic structure for this item, and helping adding statements
> or values. A bit like this existing tool, Cradle
> <https://tools.wmflabs.org/wikidata-todo/cradle/#/>, that is currently not
> based on ShEx

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Peter F. Patel-Schneider
On 10/20/18 11:57 AM, Ettore RIZZA wrote:

> From Peter F. Patel-Schneider
> Hi,
> 
> I see no reason that this [adding subclass relationships sanctioned by 
> corresponding Wikipedia pages]
>  should not be done for other groups of living
> organisms where subclass relationships are missing.  
> 
> 
> It seems very simple to me. Maybe too simple. Perhaps I am intimidated by the
> kilometers of discussions I'm reading about the taxon-centric aspect of
> Wikidata, when I'm not a biologist. So, there is no problem if we add
> that Cetacea  <https://www.wikidata.org/wiki/Q160>is a subclass of aquatic
> mammals <https://www.wikidata.org/wiki/Q3039055>, as indicated by
> its Wikipedia page <https://en.wikipedia.org/wiki/Cetacea>?
> 
> Cheers,
> 
> Ettore

How can there be any effective counter to adding these relationships?  Many
Wikidata items correspond to Wikipedia pages.   If the true information about
the Wikidata item in the corresponding pages cannot be added to the Wikidata
items, then the correspondence is not correct and should be removed.

peter

PS:  Of course, determining truth may be contentious in some cases, but these
will be a small minority.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Peter F. Patel-Schneider
On 10/20/18 6:29 AM, Ettore RIZZA wrote:
> For most people, ants are insects, not instances of taxon.

Sure, but Wikidata doesn't have ants being instances of taxon.  Instead,
Formicidae (aka ant) is an instance of taxon, which seems right to me.

Here are some extracts from Wikidata as of a few minutes ago, also showing
the English Wikipedia page for the Wikidata item.

https://www.wikidata.org/wiki/Q7386 Formicidae  ant
https://en.wikipedia.org/wiki/Ant
instance of taxon
no subclass of statement

https://www.wikidata.org/wiki/Q1390 insect
https://en.wikipedia.org/wiki/Insect
subclass of animal
instance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by
the "Ants are eusocial insects" phrase at the start of
https://en.wikipedia.org/wiki/Ant.  I added that statement and put as source
English Wikipedia.  (By the way, how can I source a statement to a particular
Wikipedia page?)


I see no reason that this should not be done for other groups of living
organisms where subclass relationships are missing.

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Peter F. Patel-Schneider
On 10/17/18 7:04 AM, Daniel Kinzler wrote:
> My (very belated) thoughts on this issue:
> 
[...]
> I say: let it produce> bad results, tell people why the results are bad, and
what they can do about it!
[...]
> 
> -- daniel
My view is that there is a big problem with this for industrial use of Wikidata.

I would very much like to use Wikidata more in my company.  However, I view it
as my duty in my company to point out problems with the use of any technology.
  So whenever I talk about Wikidata I also have to talk about the problems I
see in the Wikidata ontology and how they will affect use of Wikidata in my
company.

If Wikidata is going to have significant use in my company there needs to be
at least some indication that the problems in Wikidata are being addressed.  I
don't see that happening at the moment.


What is the biggest problem I see in Wikidata?  It is the poor organization of
the Wikidata ontology.  To fix the ontology, beyond doing point fixes, is
going to require some commitment from the Wikidata community.


Peter F. Patel-Schneider
Nuance Communications

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mapping Wikidata to other ontologies

2018-09-22 Thread Peter F. Patel-Schneider
Hi:

Why did you use exact match (P2888) instead of equivalent class (P1709) and
equivalent property (P1628)?

peter


On 9/22/18 5:07 AM, Andra Waagmeester wrote:
> Hi Maarten,
> 
>     We are actively mapping to other ontologies using the exact match P2888
> property. The disease ontology is one example which is actively
> synchronized in Wikidata using the exact match property (P2888). This property
> is inspired by the SKOS:exact match property. SKOS it self had more mapping
> properties and I think it is a good idea to introduce some of the other SKOS
> mapping properties in Wikidata such broad match and narrow match. 
> 
> Andra
> 
> On Sat, Sep 22, 2018 at 7:30 AM Maarten Dammers  > wrote:
> 
> Hi everyone,
> 
> Last week I presented Wikidata at the Semantics conference in Vienna (
> https://2018.semantics.cc/ ). One question I asked people was: What is
> keeping you from using Wikidata? One of the common responses is that
> it's quite hard to combine Wikidata with the rest of the semantic web.
> We have our own private ontology that's a bit on an island. Most of our
> triples are in our own private format and not available in a more
> generic, more widely use ontology.
> 
> Let's pick an example: Claude Lussan. No clue who he is, but my bot
> seems to have added some links and the item isn't too big. Our URI is
> http://www.wikidata.org/entity/Q2977729 and this is equivalent of
> http://viaf.org/viaf/29578396 and
> http://data.bibliotheken.nl/id/thes/p173983111 . If you look at
> http://www.wikidata.org/entity/Q2977729.rdf this equivalence is
> represented as:
> http://viaf.org/viaf/29578396"/>
>  rdf:resource="http://data.bibliotheken.nl/id/thes/p173983111"/>
> 
> Also outputting it in a more generic way would probably make using it
> easier than it is right now. Last discussion about this was at
> https://www.wikidata.org/wiki/Property_talk:P1921 , but no response
> since June.
> 
> That's one way of linking up, but another way is using equivalent
> property ( https://www.wikidata.org/wiki/Property:P1628 ) and equivalent
> class ( https://www.wikidata.org/wiki/Property:P1709 ). See for example
> sex or gender ( https://www.wikidata.org/wiki/Property:P21) how it's
> mapped to other ontologies. This won't produce easier RDF, but some
> smart downstream users have figured out some SPARQL queries. So linking
> up our properties and classes to other ontologies will make using our
> data easier. This is a first step. Maybe it will be used in the future
> to generate more RDF, maybe not and we'll just document the SPARQL
> approach properly.
> 
> The equivalent property and equivalent class are used, but not that
> much. Did anyone already try a structured approach with reporting? I'm
> considering parsing popular ontology descriptions and producing reports
> of what is linked to what so it's easy to make missing links, but I
> don't want to do double work here.
> 
> What ontologies are important because these are used a lot? Some of the
> ones I came across:
> * https://www.w3.org/2009/08/skos-reference/skos.html
> * http://xmlns.com/foaf/spec/
> * http://schema.org/
> * https://creativecommons.org/ns
> * http://dbpedia.org/ontology/
> * http://vocab.org/open/
> Any suggestions?
> 
> Maarten
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mapping Wikidata to other ontologies

2018-09-22 Thread Peter F. Patel-Schneider
It is indeed helpful to link the Wikidata ontologies to other ontologies,
particularly ones like the DBpedia ontology and the schema.org ontology.
There are already quite a few links from the Wikidata ontology to several
other ontologies, using the Wikidata equivalent class and property properties.
 Going through and ensuring that every class and property, for example, in the
DBpedia ontology or the schema.org ontology is the target of a correct (!)
link would be useful.   Then, as you indicate, it is not so hard to query
Wikidata using the external ontology or map Wikidata information into
information in the other ontology.


The Wikidata ontology is much larger (almost two million classes) and much
more fine grained than most (or maybe even all) other general-purpose
ontologies.  This is appealing as one can be much more precise in Wikidata
than in other ontologies.  It does make Wikidata harder to use (correctly)
because to represent an entity in Wikidata one has to select among many more
alternatives.

This selection is harder than it should be.  The Wikidata ontology is not well
organized.  The Wikidata ontology has errors in it.  There is not yet a good
tool for visualizing or exploring the ontology (although there are some useful
tools such as https://tools.wmflabs.org/bambots/WikidataClasses.php and
http://tools.wmflabs.org/wikidata-todo/tree.html).

So it is not trivial to set up good mappings from the Wikidata ontology to
other ontologies.   When setting up equivalences one has to be careful to
select the Wikidata class or property that is actually equivalent to the
external class or property as opposed to a Wikidata class or property that
just happens to have a similar or the same label.  One also has to be
similarly careful when setting up other relationships between the Wikidata
ontology and other ontologies.   As well, one has to be careful to select good
relationships that have well-defined meanings.  (Some SKOS relationships are
particuarly suspect.)  I suggest using only strict generalization and
specialization relationships.


So I think that an effort to completely and correctly map several external
general-purpose ontologies into the Wikidata ontology would be something for
the Wikidata community to support.  Pick a few good external ontologies and
put the needed effort into adding any missing mappings and checking the
mappings that already exist.   Get someone or some group to commit to keeping
the mapping up to date.  Announce the results and show how they are useful.


Peter F. Patel-Schneider
Nuance Communications


On 9/22/18 4:28 AM, Maarten Dammers wrote:
> Hi everyone,
> 
> Last week I presented Wikidata at the Semantics conference in Vienna (
> https://2018.semantics.cc/ ). One question I asked people was: What is keeping
> you from using Wikidata? One of the common responses is that it's quite hard
> to combine Wikidata with the rest of the semantic web. We have our own private
> ontology that's a bit on an island. Most of our triples are in our own private
> format and not available in a more generic, more widely use ontology.
> 
> Let's pick an example: Claude Lussan. No clue who he is, but my bot seems to
> have added some links and the item isn't too big. Our URI is
> http://www.wikidata.org/entity/Q2977729 and this is equivalent of
> http://viaf.org/viaf/29578396 and
> http://data.bibliotheken.nl/id/thes/p173983111 . If you look at
> http://www.wikidata.org/entity/Q2977729.rdf this equivalence is represented 
> as:
> http://viaf.org/viaf/29578396"/>
> http://data.bibliotheken.nl/id/thes/p173983111"/>
> 
> Also outputting it in a more generic way would probably make using it easier
> than it is right now. Last discussion about this was at
> https://www.wikidata.org/wiki/Property_talk:P1921 , but no response since 
> June.
> 
> That's one way of linking up, but another way is using equivalent property (
> https://www.wikidata.org/wiki/Property:P1628 ) and equivalent class (
> https://www.wikidata.org/wiki/Property:P1709 ). See for example sex or gender
> ( https://www.wikidata.org/wiki/Property:P21) how it's mapped to other
> ontologies. This won't produce easier RDF, but some smart downstream users
> have figured out some SPARQL queries. So linking up our properties and classes
> to other ontologies will make using our data easier. This is a first step.
> Maybe it will be used in the future to generate more RDF, maybe not and we'll
> just document the SPARQL approach properly.
> 
> The equivalent property and equivalent class are used, but not that much. Did
> anyone already try a structured approach with reporting? I'm considering
> parsing popular ontology descriptions and producing reports of what is linked
> to what so it's easy to make missing links, but I don't want to do double work
> here.
> 
> What ontologies are important b

Re: [Wikidata] frequency of qualifier predicates

2018-07-19 Thread Peter F. Patel-Schneider
As there is no formal representation theory for qualifiers, any categorization
of them is unlikely to totally describe their potential usage.  I tried to
come up with a categorization that reflects property descriptions and a small
sample of usage.

So I categorized stated as (P1932) as no information because I see it being
used only to provide surface syntax from the source use.  The Wikidata value
of the statement then provides the totality of the Wikidata meaning of the
statement.  It is certainly possible that the source provides more meaning
than is captured in Wikidata, but that's outside of Wikidata.

I see determination method (P459) as mostly providing information to Wikidata
as to how much to trust the Wikidata value, which to me makes it a certainty
qualifier.

peter


On 07/19/2018 06:06 AM, Pavlovic, Michal wrote:
> Thanks, Peter,
> 
> I agree with You, I was wrong before. But also the stated as (P1932)
> has certain info value 
> 
> and both the stated as (P1932) and the determination method (P459) could
> be additive ones.
> 
> 
> Michal
> 
> 
> --
> Date: Wed, 18 Jul 2018 08:30:49 -0700
> From: "Peter F. Patel-Schneider" 
> To: Discussion list for the Wikidata project
>     , "Pavlovic, Michal"
>     
> Subject: Re: [Wikidata] frequency of qualifier predicates
> Message-ID: <5d35b0ac-006f-a42a-c3bf-d4d050a64...@gmail.com>
> Content-Type: text/plain; charset=utf-8
> 
> Here is my breakdown of the top-10-by-usage qualifier predicates:
> 
>>From 
>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.wmflabs.org%2Fsqid%2F%23%2Fbrowse%3Ftype%3Dpropertiesdata=01%7C01%7Cmichal.pavlovic%40newayselectronics.com%7Cd871cc7bb7fc4860802b08d5ed6f4a7f%7Cec7e8b70c75743cc90e8805ad967f616%7C1sdata=Zcadp6KEdnQep5I3fZFikbxKqplLiKNuFI2OnktM2YA%3Dreserved=0
> as of 13 July 2018
> 
>     Label (ID)   statements qualifiers
> 
> temporal qualifier  point in time (P585)    336561  3147744
> temporal qualifier  start time (P580)   73298   1912311
> temporal qualifier  end time (P582) 54482   1236121
> temporal qualifier  valid in period (P1264) 67 656749   
> spatial qualifier   chromosome (P1057)  128429  1397383
> certainty  qualifier?   determination method (P459) 30  3407191
> no information  stated as (P1932)   3   667358
> additive    number of points scored (P1351) 1079    644273
> additive    number of matches played (P1350) 1507   628272
> additive    taxon author (P405)   0   463363
> 
> On 07/18/2018 07:39 AM, Pavlovic, Michal wrote:
>> Hi Peter,
>> 
>> of the top ten qualifier properties by usage I find maybe just the stated as
>> (P1932) as contextual one.
>> 
>> Please, can You send examples of them of contextual and additive?
>> 
>>  
>> 
>> regards
>> 
>> Michal Pavlovic
>> 
>> BaaN/Infor-Administrator
>> 
>> 
>> --
>> Date: Sat, 14 Jul 2018 08:32:43 -0700
>> From: "Peter F. Patel-Schneider" 
>> To: wikidata@lists.wikimedia.org
>> Subject: Re: [Wikidata] frequency of qualifier predicates
>> Message-ID: <6ba70a52-1f8f-eb3a-8425-7760c56d3...@gmail.com>
>> Content-Type: text/plain; charset=utf-8
>> 
>> That does the trick, thanks.
>> 
>> I was trying to see how many uses of Wikidata qualifiers are contextual 
>> (i.e.,
>> give information about in which context the statement is valid) and which 
>> were
>> additive (i.e., do not limit where the statement is valid).  
>> 
>> Of the top ten qualifier properties by usage, five are contextual, three are
>> additive, and one does not carry world information.  The last can be
>> considered to be contextual, but also might be considered to not carry world
>> information.
>> 
>> peter
>> 
>> 
>> On 07/14/2018 01:19 AM, Lydia Pintscher wrote:
>>> On Sat, Jul 14, 2018 at 1:42 AM Peter F. Patel-Schneider
>>>  wrote:
>>>> I'm trying to get a good estimate of how often which qualifier predicate 
>>>> is used.
>>>>
>>>>
>>>> The obvious query times out, as expected, so I was trying to find a list of
>>>> predicates that are used as qualifiers so that I can craft a query for 
>>>> each of
>>>> them.  There is
>>>> https://emea01.safelinks.protecti

Re: [Wikidata] frequency of qualifier predicates

2018-07-18 Thread Peter F. Patel-Schneider
Here is my breakdown of the top-10-by-usage qualifier predicates:


>From https://tools.wmflabs.org/sqid/#/browse?type=properties as of 13 July 2018

Label (ID)  statements qualifiers

temporal qualifier  point in time (P585)336561  3147744
temporal qualifier  start time (P580)   73298   1912311
temporal qualifier  end time (P582) 54482   1236121
temporal qualifier  valid in period (P1264) 67 656749   
spatial qualifier   chromosome (P1057)  128429  1397383
certainty  qualifier?   determination method (P459) 30  3407191
no information  stated as (P1932)   3   667358
additivenumber of points scored (P1351) 1079644273
additivenumber of matches played (P1350) 1507   628272
additivetaxon author (P405) 0   463363


On 07/18/2018 07:39 AM, Pavlovic, Michal wrote:
> Hi Peter,
> 
> of the top ten qualifier properties by usage I find maybe just the stated as
> (P1932) as contextual one.
> 
> Please, can You send examples of them of contextual and additive?
> 
>  
> 
> regards
> 
> Michal Pavlovic
> 
> BaaN/Infor-Administrator
> 
> 
> --
> Date: Sat, 14 Jul 2018 08:32:43 -0700
> From: "Peter F. Patel-Schneider" 
> To: wikidata@lists.wikimedia.org
> Subject: Re: [Wikidata] frequency of qualifier predicates
> Message-ID: <6ba70a52-1f8f-eb3a-8425-7760c56d3...@gmail.com>
> Content-Type: text/plain; charset=utf-8
> 
> That does the trick, thanks.
> 
> I was trying to see how many uses of Wikidata qualifiers are contextual (i.e.,
> give information about in which context the statement is valid) and which were
> additive (i.e., do not limit where the statement is valid).  
> 
> Of the top ten qualifier properties by usage, five are contextual, three are
> additive, and one does not carry world information.  The last can be
> considered to be contextual, but also might be considered to not carry world
> information.
> 
> peter
> 
> 
> On 07/14/2018 01:19 AM, Lydia Pintscher wrote:
>> On Sat, Jul 14, 2018 at 1:42 AM Peter F. Patel-Schneider
>>  wrote:
>>> I'm trying to get a good estimate of how often which qualifier predicate is 
>>> used.
>>>
>>>
>>> The obvious query times out, as expected, so I was trying to find a list of
>>> predicates that are used as qualifiers so that I can craft a query for each 
>>> of
>>> them.  There is
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FWikidata%3AList_of_properties%2FWikidata_qualifierdata=01%7C01%7Cmichal.pavlovic%40newayselectronics.com%7C0ed4fdecc1174ed0e9f108d5ea4a99b7%7Cec7e8b70c75743cc90e8805ad967f616%7C1sdata=NL6J1A7zkPT9FT%2Fvbmx8f1lDj6WoycOkie8yUAcv2qI%3Dreserved=0
>>> but that can't be trusted as it doesn't include start time (P580) or end 
>>> time
>>> (P582) which I expect to be the most common qualifier predicates.
>>>
>>>
>>> So, I'm stumped.   Any suggestions?
>> There is 
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.wmflabs.org%2Fsqid%2F%23%2Fbrowse%3Ftype%3Dpropertiesdata=01%7C01%7Cmichal.pavlovic%40newayselectronics.com%7C0ed4fdecc1174ed0e9f108d5ea4a99b7%7Cec7e8b70c75743cc90e8805ad967f616%7C1sdata=aPaVTA%2FWb45U%2BqBxMBoThvX8sPs43%2BFYZtuMyVFwTmE%3Dreserved=0
> which
>> might help you.
>>
>>
>> Cheers
>> Lydia
>>
> 
> 
> --
> This e-mail and any attachment is confidential and may be legally privileged
> or otherwise protected from disclosure. If you are not the intended recipient,
> please notify the sender immediately by returning this e-mail.
> Neways Electronics NV is not responsible for the improper or incomplete
> transmission of any e-mail, or for any delay in its receipt.
> --
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] frequency of qualifier predicates

2018-07-16 Thread Peter F. Patel-Schneider
To get counts of usage you have to do something like

select ?pq count(*)  {
  ?p wikibase:qualifier ?pq
  ?x ?pq ?y
} groupby ?pq

(well, you also want the labels).

Fortunately the counts are available at the page Lydia mentioned.

peter


On 07/16/2018 03:45 AM, Vladimir Alexiev wrote:
> The list of props (4.9k) is returned quickly enough. Unfortunately it
> includes all props: each one has a wikibase:qualifier "just in case"
>
> select * {
>   ?p wikibase:qualifier ?pq
> }
>
> It is a pity that this one times out, since the filter merely needs to
> look for 1 statement instance, 4.9k times:
>
> select * {
>   ?p wikibase:qualifier ?pq
>   filter exists {?x ?pq ?y}
> } limit 100
>
> What query did you try?
>
> On Sat, Jul 14, 2018 at 2:40 AM, Peter F. Patel-Schneider
>  wrote:
>> I'm trying to get a good estimate of how often which qualifier predicate is 
>> used.
>>
>>
>> The obvious query times out, as expected, so I was trying to find a list of
>> predicates that are used as qualifiers so that I can craft a query for each 
>> of
>> them.  There is
>> https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_qualifier
>> but that can't be trusted as it doesn't include start time (P580) or end time
>> (P582) which I expect to be the most common qualifier predicates.
>>
>>
>> So, I'm stumped.   Any suggestions?
>>
>>
>> peter
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] frequency of qualifier predicates

2018-07-14 Thread Peter F. Patel-Schneider
That can't be the case as some of the properties there have non-trivial usage
as statement properties. 


For example, https://tools.wmflabs.org/sqid/#/browse?type=properties says that
applies to part is uses 121 times as a statement property.


peter



On 07/14/2018 01:16 AM, Nicolas VIGNERON wrote:
> 2018-07-14 1:40 GMT+02:00 Peter F. Patel-Schneider  <mailto:pfpschnei...@gmail.com>>:
>
> I'm trying to get a good estimate of how often which qualifier predicate
> is used.
>
>
> The obvious query times out, as expected, so I was trying to find a list 
> of
> predicates that are used as qualifiers so that I can craft a query for
> each of
> them.  There is
> 
> https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_qualifier
> 
> <https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_qualifier>
> but that can't be trusted as it doesn't include start time (P580) or end
> time
> (P582) which I expect to be the most common qualifier predicates.
>
> IIRC this is the complete list of property that are *only* qualifiers
> (that's why P580 and P580 are not in this list).
>
> Cdlt, ~nicolas
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] frequency of qualifier predicates

2018-07-14 Thread Peter F. Patel-Schneider
That does the trick, thanks.


I was trying to see how many uses of Wikidata qualifiers are contextual (i.e.,
give information about in which context the statement is valid) and which were
additive (i.e., do not limit where the statement is valid).  


Of the top ten qualifier properties by usage, five are contextual, three are
additive, and one does not carry world information.  The last can be
considered to be contextual, but also might be considered to not carry world
information.


peter



On 07/14/2018 01:19 AM, Lydia Pintscher wrote:
> On Sat, Jul 14, 2018 at 1:42 AM Peter F. Patel-Schneider
>  wrote:
>> I'm trying to get a good estimate of how often which qualifier predicate is 
>> used.
>>
>>
>> The obvious query times out, as expected, so I was trying to find a list of
>> predicates that are used as qualifiers so that I can craft a query for each 
>> of
>> them.  There is
>> https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_qualifier
>> but that can't be trusted as it doesn't include start time (P580) or end time
>> (P582) which I expect to be the most common qualifier predicates.
>>
>>
>> So, I'm stumped.   Any suggestions?
> There is https://tools.wmflabs.org/sqid/#/browse?type=properties which
> might help you.
>
>
> Cheers
> Lydia
>


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] frequency of qualifier predicates

2018-07-13 Thread Peter F. Patel-Schneider
I'm trying to get a good estimate of how often which qualifier predicate is 
used.


The obvious query times out, as expected, so I was trying to find a list of
predicates that are used as qualifiers so that I can craft a query for each of
them.  There is
https://www.wikidata.org/wiki/Wikidata:List_of_properties/Wikidata_qualifier
but that can't be trusted as it doesn't include start time (P580) or end time
(P582) which I expect to be the most common qualifier predicates.


So, I'm stumped.   Any suggestions?


peter



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-04 Thread Peter F. Patel-Schneider
Yeah, that would be nice.


You can zoom in on the image, and search for the labels in it.  Unfortunately
many of the labels are truncated, e.g., WordNe


Clicking on a node gets the raw data backing up the image, but I don't see how
to get the processed data.  The data for some of the nodes either doesn't have
enough information to determine whether the source actually satisfies the
requirements to be in the LOD Cloud (Wordnet,
universal-dependencies-treebank-hebrew) or something about the source doesn't
work anymore (Freebase).


peter



On 05/04/2018 09:52 AM, Bruce Whealton wrote:
> Is there an easy way to navigate this?  I was wondering if there was a way
> to zoom-in on a certain area and then see connections from that image.  When
> I clicked on something I got a JSON view.  I don't know how much coding it
> would take to have something like the Visual Thesaurus where clicking on
> links brings that circle into focus with its first degree connections. 
> Maybe I need a magnifier on my 4k monitor.
>
> Bruce 
>
> On Mon, Apr 30, 2018 at 3:17 PM, Ettore RIZZA  > wrote:
>
> Hi all,
>
> The new version of the "Linked Open data Cloud" graph
>   is out ... and still no Wikidata in it.
> According to this Twitter discussion
> , this would be
> due to a lack of metadata on Wikidata. No way to fix that easily? The
> LOD cloud is cited in many scientific papers, it is not a simple gadget.
>
> Cheers,
>
> Ettore Rizza
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
>
>
>
>
> -- 
> Bruce M Whealton Jr.
> My Online Resume: http://fwwebdev.com/myresume/bruce-whealton-resume-view
> I do business as Future Wave Web Development
> http://futurewavewebdevelopment.com
> Providing Web Development & Design, as well as Programming/Software 
> Engineering
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-01 Thread Peter F. Patel-Schneider
Thanks for the corrections.

So https://www.wikidata.org/entity/Q42 is *the* Wikidata IRI for Douglas
Adams.  Retrieving from this IRI results in a 303 See Other to
https://www.wikidata.org/wiki/Special:EntityData/Q42, which (I guess) is the
main IRI for representations of Douglas Adams and other pages with
information about him.

From https://www.wikidata.org/wiki/Special:EntityData/Q42 content
negotiation can be used to get the JSON representation (the default), other
representations including Turtle, and human-readable information.  (Well
actually I'm not sure that this is really correct.  It appears that instead
of directly using content negotiation, another 303 See Other is used to
provide an IRI for a document in the requested format.)

https://www.wikidata.org/wiki/Special:EntityData/Q42.json and
https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl are the useful
machine-readable documents containing the Wikidata information about Douglas
Adams.  Content negotiation is not possible on these pages.

https://www.wikidata.org/wiki/Q42 is the IRI that produces a human-readable
version of the information about Douglas Adams.  Content negotiation is not
possible on this page, but it does have link rel="alternate" to the
machine-readable pages.

Strangely this page has a link rel="canonical" to itself.  Shouldn't that
link be to https://www.wikidata.org/entity/Q42?  There is a human-visible
link to this IRI, but there doesn't appear to be any machine-readable link.

RDF links to other IRIs for Douglas Adams are given in RDF pages by
properties in the wdtn namespace.  Many, but not all, identifiers are
handled this way.  (Strangely ISNI (P213) isn't even though it is linked on
the human-readable page.)

So it looks as if Wikidata can be considered as Linked Open Data but maybe
some improvements can be made.


peter



On 05/01/2018 01:03 AM, Antoine Zimmermann wrote:
> On 01/05/2018 03:25, Peter F. Patel-Schneider wrote:
>> As far as I can tell real IRIs for Wikidata are https URIs.  The http IRIs
>> redirect to https IRIs.
>
> That's right.
>
>>   As far as I can tell no content negotiation is
>> done.
>
> No, you're mistaken. Your tried the URL of a wikipage in your curl command.
> Those are for human consumption, thus not available in turtle.
>
> The "real IRIs" of Wikidata entities are like this:
> https://www.wikidata.org/entity/Q{NUMBER}
>
> However, they 303 redirect to
> https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}
>
> which is the identifier of a schema:Dataset. Then, if you HTTP GET these
> URIs, you can content negotiate them to JSON
> (https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}.json) or to
> turtle (https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}.ttl).
>
>
> Suprisingly, there is no connection between the entity IRIs and the wikipage
> URLs. If one was given the IRI of an entity from Wikidata, and had no
> further information about how Wikidata works, they would not be able to
> retrieve HTML content about the entity.
>
>
> BTW, I'm not sure the implementation of content negotiation in Wikidata is
> correct because the server does not tell me the format of the resource to
> which it redirects (as opposed to what DBpedia does, for instance).
>
>
> --AZ


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-04-30 Thread Peter F. Patel-Schneider
As far as I can tell real IRIs for Wikidata are https URIs.  The http IRIs
redirect to https IRIs.  As far as I can tell no content negotiation is
done.

peter



idefix merging> curl -I http://www.wikidata.org/wiki/Q5200
HTTP/1.1 301 TLS Redirect
Date: Tue, 01 May 2018 01:13:09 GMT
Server: Varnish
X-Varnish: 227838359
X-Cache: cp1068 int
X-Cache-Status: int-front
Set-Cookie: WMF-Last-Access=01-May-2018;Path=/;HttpOnly;secure;Expires=Sat, 02
Jun 2018 00:00:00 GMT
Set-Cookie:
WMF-Last-Access-Global=01-May-2018;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sat,
02 Jun 2018 00:00:00 GMT
X-Client-IP: 199.4.160.88
Location: https://www.wikidata.org/wiki/Q5200
Content-Length: 0
Connection: keep-alive


idefix merging> curl -I https://www.wikidata.org/wiki/Q5200
HTTP/2 200
date: Tue, 01 May 2018 01:14:58 GMT
content-type: text/html; charset=UTF-8
server: mw1252.eqiad.wmnet
x-content-type-options: nosniff
p3p: CP="This is not a P3P policy! See
https://www.wikidata.org/wiki/Special:CentralAutoLogin/P3P for more info."
x-powered-by: HHVM/3.18.6-dev
content-language: en
link: ;rel=preload;as=image
vary: Accept-Encoding,Cookie,Authorization
x-ua-compatible: IE=Edge
backend-timing: D=75094 t=1525107829593021
x-varnish: 754403290 624210434, 194797954 924438274
via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
age: 29467
x-cache: cp1067 hit/8, cp1068 hit/9
x-cache-status: hit-front
set-cookie: CP=H2; Path=/; secure
strict-transport-security: max-age=106384710; includeSubDomains; preload
set-cookie: WMF-Last-Access=01-May-2018;Path=/;HttpOnly;secure;Expires=Sat, 02
Jun 2018 00:00:00 GMT
set-cookie:
WMF-Last-Access-Global=01-May-2018;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sat,
02 Jun 2018 00:00:00 GMT
x-analytics: ns=0;page_id=52899665;https=1;nocookies=1
x-client-ip: 199.4.160.88
cache-control: private, s-maxage=0, max-age=0, must-revalidate
set-cookie: GeoIP=US:MA:Woburn:42.49:-71.16:v4; Path=/; secure;
Domain=.wikidata.org
accept-ranges: bytes

idefix merging> curl -I -H "Accept: text/turtle"
https://www.wikidata.org/wiki/Q5200
HTTP/2 200
date: Tue, 01 May 2018 01:15:52 GMT
content-type: text/html; charset=UTF-8
server: mw1252.eqiad.wmnet
x-content-type-options: nosniff
p3p: CP="This is not a P3P policy! See
https://www.wikidata.org/wiki/Special:CentralAutoLogin/P3P for more info."
x-powered-by: HHVM/3.18.6-dev
content-language: en
link: ;rel=preload;as=image
vary: Accept-Encoding,Cookie,Authorization
x-ua-compatible: IE=Edge
backend-timing: D=75094 t=1525107829593021
x-varnish: 754403290 624210434, 160015159 924438274
via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
age: 29522
x-cache: cp1067 hit/8, cp1068 hit/10
x-cache-status: hit-front
set-cookie: CP=H2; Path=/; secure
strict-transport-security: max-age=106384710; includeSubDomains; preload
set-cookie: WMF-Last-Access=01-May-2018;Path=/;HttpOnly;secure;Expires=Sat, 02
Jun 2018 00:00:00 GMT
set-cookie:
WMF-Last-Access-Global=01-May-2018;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sat,
02 Jun 2018 00:00:00 GMT
x-analytics: ns=0;page_id=52899665;https=1;nocookies=1
x-client-ip: 199.4.160.88
cache-control: private, s-maxage=0, max-age=0, must-revalidate
set-cookie: GeoIP=US:MA:Woburn:42.49:-71.16:v4; Path=/; secure;
Domain=.wikidata.org
accept-ranges: bytes



On 04/30/2018 02:53 PM, Lucas Werkmeister wrote:
> The real URI (without scare quotes :) ) is not
> https://www.wikidata.org/wiki/Q5200 but
> http://www.wikidata.org/entity/Q5200 – and depending on your Accept
> header, that will redirect you to the wiki page, JSON dump, or RDF data
> (in XML or Turtle formats). Since the LOD Cloud criteria explicitly
> mentions content negotiation, I think we’re good :)
>
> Cheers,
> Lucas
>
> On 30.04.2018 23:08, Peter F. Patel-Schneider wrote:
>> Does it?  The point is not just that Wikidata has real pointers to external
>> resources.  
>>
>>
>> Wikidata needs to serve RDF (e.g., in Turtle) in an accepted fashion.  Is
>> having https://www.wikidata.org/wiki/Special:EntityData/Q5200.ttl
>> available and linked to with an alternate link count when the "real" URI is
>> https://www.wikidata.org/wiki/Q5200?  I don't know enough about this
>> corner of web standards to know.
>>
>>
>> peter
>>
>>
>>
>>
>>
>>
>> On 04/30/2018 01:45 PM, Federico Leva (Nemo) wrote:
>>> Peter F. Patel-Schneider, 30/04/2018 23:32:
>>>> Does the way that Wikidata serves RDF
>>>> (https://www.wikidata.org/wiki/Special:EntityData/Q5200.rdf) satisfy 
>>>> this
>>>> requirement?
>>> I think that part was already settled with:
>>> https://lists.wikimedia.org/pipermail/wikidata/2017-October/011314.html
>>>
>>> More information:
>>> http

Re: [Wikidata] Wikiata and the LOD cloud

2018-04-30 Thread Peter F. Patel-Schneider
Yes, it would be nice to have Wikidata there, provided that Wikidata satisfies
the requirements.  There are already several mentions of Wikidata in the data
behind the diagram.


I don't think that Freebase satisfies the stated requirement because its URIs
no longer "resolve, with or without content negotiation, to /RDF data/ in one
of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples)".  I wonder why
Freebase is still in the diagram.


Does the way that Wikidata serves RDF
(https://www.wikidata.org/wiki/Special:EntityData/Q5200.rdf) satisfy this
requirement?  (If it doesn't, it might be easy to change.)


peter



On 04/30/2018 12:17 PM, Ettore RIZZA wrote:
> Hi all,
>
> The new version of the "Linked Open data Cloud" graph
>   is out ... and still no Wikidata in it. According
> to this Twitter discussion
> , this would be due
> to a lack of metadata on Wikidata. No way to fix that easily? The LOD cloud
> is cited in many scientific papers, it is not a simple gadget.
>
> Cheers,
>
> Ettore Rizza
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

2017-12-02 Thread Peter F. Patel-Schneider
It's not easy to get to a true paradox with this collection.   Not only do you
have to be able to express it but you have to require that it exists.

Peter F. Patel-Schneider


On 12/02/2017 11:09 AM, mathieu stumpf guntz wrote:
>
> Hi all,
>
> You should in any case be sure to avoid allowing collections which fall in
> Russell's paradox <https://en.wikipedia.org/wiki/Russell%27s_paradox>. So if
> a predicate "belongs to collection QX" is added such that an Wikidata item
> can be stated as being part of an other, it must be envisionned that at some
> point a request my aske "What is the collection of items that do not belongs
> to themselves?".
>
> Paradoxically logical,
> mathieu
>
> Le 27/11/2017 à 02:07, Arthur Smith a écrit :
>> I think the general idea of documenting collections is a good one, though I
>> haven't thought carefully about this or some of the responses already sent.
>> However, I think the use of P361 (part of) for this purpose might not be a
>> good idea and a new property should be proposed for it, or some other
>> mechanism used for large collection handling (collections added through Mix
>> n Match for example generally have external identifiers as their
>> collection-specific properties). My concern here is mainly that the
>> relationship is not generally going to be intrinsic to the item, and is
>> more related to the project doing the import work, while P361 should
>> generally describe some intrinsic relationship that an item has (for
>> example a subsidiary being part of a parent company, a component of a
>> device being part of the device, a research article being part of a
>> particular journal issue, etc).
>>
>> We do have a very new property that might be useable for this purpose,
>> though it is intended to link to Wikiprojects rather than "collection"
>> items - P4570 (Wikidata project). Or perhaps something similar should be
>> proposed?
>>
>>    Arthur
>>
>>


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Coordinate precision in Wikidata, RDF & query service

2017-09-01 Thread Peter F. Patel-Schneider
The GPS unit on my boat regularly claims an estimated position error of 4
feet after it has acquired its full complement of satellites.  This is a
fairly new mid-price GPS unit using up to nine satellites and WAAS.  So my
recreational GPS supposedly obtains fifth-decimal-place accuracy.  It was
running under an unobstructed sky, which is common when boating.  Careful
use of a good GPS unit should be able to achieve this level of accuracy on
land as well.

From http://www.gps.gov/systems/gps/performance/accuracy/ the raw accuracy
of the positioning information from a satellite is less than 2.4 feet 95% of
the time.  The accuracy reported by a GPS unit is degraded by atmospheric
conditions; false signals, e.g., bounces; and the need to determine position
by intersecting the raw data from several satellites.  Accuracy can be
improved by using more satellites and multiple frequencies and by
comparing to a signal from a receiver at a known location.

The web page above claims that accuracy can be improved to a few centimeters
in real time and down to the millimeter level if a device is left in the
same place for a long period of time.  I think that these last two
accuracies require a close-by receiver at a known location and correspond
to what is said in [4].

peter



On 08/30/2017 06:53 PM, Nick Wilson (Quiddity) wrote:
> On Tue, Aug 29, 2017 at 2:13 PM, Stas Malyshev  
> wrote:
>> [...] Would four decimals
>> after the dot be enough? According to [4] this is what commercial GPS
>> device can provide. If not, why and which accuracy would be appropriate?
>>
> 
> I think that should be 5 decimals for commercial GPS, per that link?
> It also suggests that "The sixth decimal place is worth up to 0.11 m:
> you can use this for laying out structures in detail, for designing
> landscapes, building roads. It should be more than good enough for
> tracking movements of glaciers and rivers. This can be achieved by
> taking painstaking measures with GPS, such as differentially corrected
> GPS."
> 
> Do we hope to store datasets around glacier movement? It seems
> possible. (We don't seem to currently
> https://www.wikidata.org/wiki/Q770424 )
> 
> I skimmed a few search results, and found 7 (or 15) decimals given in
> one standard, but the details are beyond my understanding:
> http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/geoprocessing_environments/about_coverage_precision.htm
> https://stackoverflow.com/questions/1947481/how-many-significant-digits-should-i-store-in-my-database-for-a-gps-coordinate
> https://stackoverflow.com/questions/7167604/how-accurately-should-i-store-latitude-and-longitude
> 
>> [4]
>> https://gis.stackexchange.com/questions/8650/measuring-accuracy-of-latitude-and-longitude
> 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wordnet synset ID

2017-08-21 Thread Peter F. Patel-Schneider
I think that it depends on just what data is used from them.

I've looked a bit at the Yago data.   The mappings from Wikipedia categories
to WordNet synsets are pretty good, although there will be some errors.  The
instance relationships are less good as they are just the Wikipedia category
membership relationships, which are known to have problems.

So I would think that what to get from Yago is the Wikipedia to Wordnet
mappings, not the instance relationships.

I haven't looked at BabelNet much.  My expectation is that they will have more
mappings but maybe at the price of more errors.

If there is a desire to have a mapping from Wikidata to Wordnet, the only
reason to not want to use of these efforts as a start is the quality of their
mapping.   Perhaps one option is to repeat something along the lines of what
they did but more closely tailored to the situation in Wikidata.

peter




On 08/21/2017 07:56 AM, Denny Vrandečić wrote:
> I think we could ask either Yago or BabelNet or both whether they would be
> receptive to release their mappings under a CC0 license, so it can be
> integrated into Wikidata. What I wonder is, if they do that, whether we wanted
> to have that data or not.
> 
> On Mon, Aug 21, 2017 at 7:18 AM Peter F. Patel-Schneider
> <pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:
> 
> One problem with BabelNet is that its licence is restrictive, being
> the Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
> license.  Downloading BabelNet is even more restrictive, requiring also
> working at a research institution.
> 
> Yago
> 
> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
> , which has the less restrictive license Attribution 3.0 Unported (CC BY 
> 3.0),
> has links between Wikipedia categories and Wordnet.  Unfortunately, it 
> does
> not carry these links through to regular Wikipedia pages.   I've been 
> toying
> with making this last connection, which would be easy for those categories
> that a linked to Wikipedia page.
> 
> Peter F. Patel-Schneider
> Nuance Communications
> 
> PS:  Strangely the Yago logo has a non-commercial license.  I don't know 
> why
> this was done.
> 
> On 08/15/2017 10:32 AM, Finn Aarup Nielsen wrote:
> >
> > I do not think we have a Wiktionary-wordnet link.
> >
> > But I forgot to write we have a BabelNet Wikidata property,
> > https://www.wikidata.org/wiki/Property:P2581. This property has been 
> very
> > little used: http://tinyurl.com/y8npwsm5
> >
> > There might be a Wikimedia-Wordnet indirect link through BabelNet
> >
> > /Finn
> >
> >
> > On 08/15/2017 07:22 PM, Denny Vrandečić wrote:
> >> That's a great question, I have no idea what the answer will turn out
> to be.
> >>
> >> Is there any current link between Wiktionary and WordNet? Or WordNet 
> and
> >> Wikipedia?
> >>
> >>
> >> On Tue, Aug 15, 2017 at 10:14 AM <f...@imm.dtu.dk 
> <mailto:f...@imm.dtu.dk>
> <mailto:f...@imm.dtu.dk <mailto:f...@imm.dtu.dk>>> wrote:
> >>
> >>
> >>
> >> I have proposed a Wordnet synset property here:
> >>   
>  
> https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wordnet_synset_ID
> >>
> >> The property has been discussed here on the mailing list more than 
> a
> >> year ago, but apparently never got to the point of a property
> >> suggestion:
> >> 
> https://lists.wikimedia.org/pipermail/wikidata/2016-April/008517.html
> >>
> >> I am wondering how the potential property fits in with the new
> >> development of the Wiktionary-Wikidata link. As far as I see the
> senses,
> >> for instance, at
> http://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15
> >> link to wikidata-lexeme Q-items, which I suppose is Wikidata Q 
> items
> >> once the new development is put into the production system. So 
> with my
> >> understanding linking Wikidata Q-items to Wordnet synsets is
> correct. Is
> >> my understanding correct?
> >>
> >>
> >> Finn Årup Nielsen
> >> http://people.compute.dtu.dk/faan/
> >>
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@

Re: [Wikidata] Wordnet synset ID

2017-08-21 Thread Peter F. Patel-Schneider
One problem with BabelNet is that its licence is restrictive, being
the Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
license.  Downloading BabelNet is even more restrictive, requiring also
working at a research institution.

Yago
http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
, which has the less restrictive license Attribution 3.0 Unported (CC BY 3.0),
has links between Wikipedia categories and Wordnet.  Unfortunately, it does
not carry these links through to regular Wikipedia pages.   I've been toying
with making this last connection, which would be easy for those categories
that a linked to Wikipedia page.

Peter F. Patel-Schneider
Nuance Communications

PS:  Strangely the Yago logo has a non-commercial license.  I don't know why
this was done.

On 08/15/2017 10:32 AM, Finn Aarup Nielsen wrote:
> 
> I do not think we have a Wiktionary-wordnet link.
> 
> But I forgot to write we have a BabelNet Wikidata property,
> https://www.wikidata.org/wiki/Property:P2581. This property has been very
> little used: http://tinyurl.com/y8npwsm5
> 
> There might be a Wikimedia-Wordnet indirect link through BabelNet
> 
> /Finn
> 
> 
> On 08/15/2017 07:22 PM, Denny Vrandečić wrote:
>> That's a great question, I have no idea what the answer will turn out to be.
>>
>> Is there any current link between Wiktionary and WordNet? Or WordNet and
>> Wikipedia?
>>
>>
>> On Tue, Aug 15, 2017 at 10:14 AM <f...@imm.dtu.dk <mailto:f...@imm.dtu.dk>> 
>> wrote:
>>
>>
>>
>> I have proposed a Wordnet synset property here:
>> 
>> https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wordnet_synset_ID
>>
>> The property has been discussed here on the mailing list more than a
>> year ago, but apparently never got to the point of a property
>> suggestion:
>> https://lists.wikimedia.org/pipermail/wikidata/2016-April/008517.html
>>
>> I am wondering how the potential property fits in with the new
>> development of the Wiktionary-Wikidata link. As far as I see the senses,
>> for instance, at http://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15
>> link to wikidata-lexeme Q-items, which I suppose is Wikidata Q items
>> once the new development is put into the production system. So with my
>> understanding linking Wikidata Q-items to Wordnet synsets is correct. Is
>> my understanding correct?
>>
>>
>> Finn Årup Nielsen
>> http://people.compute.dtu.dk/faan/
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata-tech] wikidata label service

2017-03-24 Thread Peter F. Patel-Schneider
I'm trying to figure out whether the Wikidata label service (the stuff that is
invoked as
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
in queries to the Wikidata query service) is something that can be done in
SPARQL or whether it is an extension that can't be done in SPARQL.  Does
anyone know the answer to this?

The reason that I ask is that it appears that the service accesses the names
of SPARQL variables in solution sets, and I can't think of how it does that
using SPARQL facilities.


peter

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata] Wikidata ontology

2017-01-09 Thread Peter F. Patel-Schneider
Although there is no formal problem here, care does have to be taken when
modelling entities that are to be considered as both classes and non-classes
(or, and especially, metaclasses and non-metaclass classes).  It is all too
easy for even experienced modellers to make mistakes.  The problem is worse
when the modelling formalism is weak (as the Wikidata formalism is) and thus
does not itself provide much support to detect mistakes.  The problem is even
worse when the modelling methodology often does not provide much description
of the entities (as is the case in Wikidata).


The paper that Denny cites proposes that each entity be given a level (0 for
non-class entities and some number greater than 0 for classes).  The instance
of relationship is limited so that it only relates entities to entities that
are a single level higher and the subclass of relationship is limited to that
it only relates entities within a single level.  This rules out the
problematic earthquake (Q7944), which used to be both an instance and a
subclass of natural disaster (Q8065), and white (Q23444), which is currently
both an instance and a subclass of color (Q1075).  Although neither of these
situations is a formal failure they are both almost certainly modelling 
failures.

It is, however, useful to be able to model entities that do not fit into this
modelling methodology, like the class of all classes.  These exceptions are, I
think, rare.


Anyway, what this points out is that there are problems in how Wikidata models
the world.  Better guidelines on how to model on Wikidata would be useful.
Strict rules, however, can easily prevent modelling what Wikidata should be
modelling.

My suggestion is that Wikidata classes should have more information associated
with them.   It should be possible for a modeller to easily determine how a
class is supposed to be used.  This is not currently possible for color and I
think is the main source of the problems with color.

Peter F. Patel-Schneider
Nuance Communications



On 01/09/2017 10:28 AM, Denny Vrandečić wrote:
> I agree with Peter here. Daniel's statement of "Anything that is a subclass of
> X, and at the same an instance of Y, where Y is not "class", is problematic."
> is simply too strong. The classical example is Harry the eagle, and eagle
> being a species.
> 
> The following paper has a much more measured and subtle approach to this 
> question:
> 
> http://snap.stanford.edu/wikiworkshop2016/papers/Wiki_Workshop__WWW_2016_paper_11.pdf
>  
> 
> I still think it is potentially and partially too strong, but certainly much
> better than Daniel's strict statement.
> 
> 
> 
> On Mon, Jan 9, 2017 at 7:58 AM Peter F. Patel-Schneider
> <pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:
> 
> On 01/09/2017 07:20 AM, Daniel Kinzler wrote:
> > Am 09.01.2017 um 04:36 schrieb Markus Kroetzsch:
> >> Only the "current king of Iberia" is a single person, but Wikidata is
> about all
> >> of history, so there are many such kings. The office of "King of 
> Iberia" is
> >> still singular (it is a singular class) and it can have its own
> properties etc.
> >> I would therefore say (without having checked the page):
> >>
> >> King of Iberiainstance of  office
> >> King of Iberiasubclass of  king
> >
> > To be semantically strict, you would need to have two separate items,
> one for
> > the office, and one for the class. Because the individual kinds have not
> been
> > instances of the office - they have been holders of the office. And they
> have
> > been instances of the class, but not holders of the class.
> >
> > On wikidata, we often conflate these things for sake of simplicity. But
> when you
> > try to write queries, this does not make things simpler, it makes it 
> harder.
> >
> > Anything that is a subclass of X, and at the same an instance of Y,
> where Y is
> > not "class", is problematic. I think this is the root of the confusion
> Gerards
> > speaks of.
> 
> There is no a priori reason that an office cannot be a class.  Some 
> formalisms
> don't allow this, but there are others that do.  Some sets of rules for
> ontology construction don't allow this, but there are others that do.  
> There
> is certainly no universal semantic consideration, even in any strict 
> notion of
>     semantics, that would require that there be two separate items here.
> 
> As far as I can tell, the Wikidata formalism is not one that would 
> disallow
> offices being classes.  As far as I can tell, the rules for construc

Re: [Wikidata] Wikidata ontology

2017-01-09 Thread Peter F. Patel-Schneider
On 01/09/2017 07:20 AM, Daniel Kinzler wrote:
> Am 09.01.2017 um 04:36 schrieb Markus Kroetzsch:
>> Only the "current king of Iberia" is a single person, but Wikidata is about 
>> all
>> of history, so there are many such kings. The office of "King of Iberia" is
>> still singular (it is a singular class) and it can have its own properties 
>> etc.
>> I would therefore say (without having checked the page):
>>
>> King of Iberiainstance of  office
>> King of Iberiasubclass of  king
> 
> To be semantically strict, you would need to have two separate items, one for
> the office, and one for the class. Because the individual kinds have not been
> instances of the office - they have been holders of the office. And they have
> been instances of the class, but not holders of the class.
> 
> On wikidata, we often conflate these things for sake of simplicity. But when 
> you
> try to write queries, this does not make things simpler, it makes it harder.
> 
> Anything that is a subclass of X, and at the same an instance of Y, where Y is
> not "class", is problematic. I think this is the root of the confusion Gerards
> speaks of.

There is no a priori reason that an office cannot be a class.  Some formalisms
don't allow this, but there are others that do.  Some sets of rules for
ontology construction don't allow this, but there are others that do.  There
is certainly no universal semantic consideration, even in any strict notion of
semantics, that would require that there be two separate items here.

As far as I can tell, the Wikidata formalism is not one that would disallow
offices being classes.  As far as I can tell, the rules for constructing the
Wikidata ontology don't disallow it either.

Peter F. Patel-Schneider
Nuance Communications

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Breaking change in JSON serialization?

2016-08-16 Thread Peter F. Patel-Schneider
On 08/16/2016 07:57 AM, Daniel Kinzler wrote:
> Am 11.08.2016 um 23:12 schrieb Peter F. Patel-Schneider:
>> Until suitable versioning is part of the Wikidata JSON dump format and
>> contract, however, I don't think that consumers of the dumps should just
>> ignore new fields.
> 
> Full versioning is still in the future, but I'm happy that we are in the 
> process
> of finalizing a policy on stable interfaces, including a contract regarding
> adding fields:
> <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>.
> Please comment on the talk page.

Looks quite good.  I put in a few comments, particularly to claim that this
would be an ideal time to add versioning.

peter


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Breaking change in JSON serialization?

2016-08-11 Thread Peter F. Patel-Schneider
On 08/11/2016 01:35 PM, Stas Malyshev wrote:
> Hi!
> 
>> My view is that this tool should be extremely cautious when it sees new data
>> structures or fields.  The tool should certainly not continue to output
>> facts without some indication that something is suspect, and preferably
>> should refuse to produce output under these circumstances.
> 
> I don't think I agree. I find tools that are too picky about details
> that are not important to me hard to use, and I'd very much prefer a
> tool where I am in control of which information I need and which I don't
> need.

My point is that the tool has no way of determining what is important and what
is not important, at least under the current state of affairs with respect to
the Wikidata JSON dump.  Given this, a tool that ignores what could easily be
an important change is a dangerous tool.

>> What can happen if the tool instead continues to operate without complaint
>> when new data structures are seen?  Consider what would happen if the tool
>> was written for a version of Wikidata that didn't have rank, i.e., claim
>> objects did not have a rank name/value pair.  If ranks were then added,
>> consumers of the output of the tool would have no way of distinguishing
>> deprecated information from other information.
> 
> Ranks are a bit unusual because ranks are not just informational change,
> it's a semantic change. It introduces a concept of a statement that has
> different semantics than the rest. Of course, such change needs to be
> communicated - it's like I would make format change "each string
> beginning with letter X needs to be read backwards" but didn't tell the
> clients. Of course this is a breaking change if it changes semantics.
> 
> What I was talking are changes that don't break semantics, and majority
> of additions are just that.

Yes, the majority of changes are not of this sort, but tools currently can't
determine which changes are of this sort and which are not.

>> Of course this is an extreme case.  Most changes to the Wikidata JSON dump
>> format will not cause such severe problems.  However, given the current
>> situation with how the Wikidata JSON dump format can change, the tool cannot
>> determine whether any particular change will affect the meaning of what it
>> produces.  Under these circumstances it is dangerous for a tool that
>> extracts information from the Wikidata JSON dump to continue to produce
>> output when it sees new data structures.
> 
> The tool can not. It's not possible to write a tool that would derive
> semantics just from JSON dump, or even detect semantic changes. Semantic
> changes can be anywhere, it doesn't have to be additional field - it can
> be in the form of changing the meaning of the field, or format, or
> datatype, etc. Of course the tool can not know that - people should know
> that and communicate it. Again, that's why I think we need to
> distinguish changes that break semantics and changes that don't, and
> make the tools robust against the latter - but not the former because
> it's impossible. For dealing with the former, there is a known and
> widely used solution - format versioning.

Yes, if a suitable sort of versioning contract was implemented then things
would dramatically change.  Tools could depend on "breaking" changes always
being accompanied by a version bump and then they might be able to ignore new
fields if the version does not change.  However, this is not the current state
of affairs with the Wikidata JSON dump format.

>> This does make consuming tools sensitive to changes to the Wikidata JSON
>> dump format that are "non-breaking".  To overcome this problem there should
>> be a way for tools to distinguish changes to the Wikidata JSON dump format
>> that do not change the meaning of existing constructs in the dump from those
>> that can.  Consuming tools can then continue to function without problems
>> for the former kind of change.
> 
> As I said, format versioning. Maybe even semver or some suitable
> modification of it. RDF exports BTW already carry version. Maybe JSON
> exports should too.

Right.  I'm all for version information being added to the Wikidata JSON dump
format.  It would make the production use of these dumps much safer.

Until suitable versioning is part of the Wikidata JSON dump format and
contract, however, I don't think that consumers of the dumps should just
ignore new fields.


Peter F. Patel-Schneider
Nuance Communcations



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Breaking change in JSON serialization?

2016-08-11 Thread Peter F. Patel-Schneider
On 08/05/2016 08:57 AM, Daniel Kinzler wrote:
> Am 05.08.2016 um 17:34 schrieb Peter F. Patel-Schneider:
>> So some additions are breaking changes then.   What is a system that consumes
>> this information supposed to do?  If the system doesn't monitor announcements
>> then it has to assume that any new field can be a breaking change and thus
>> should not accept data that has any new fields.
> 
> The only way to avoid breakage is to monitor announcements. The format is not
> final, so changes can happen (not just additions, but also removals), and then
> things will break if they are unaware. We tend to be careful and conservative,
> and announce any breaking changes in advance, but do not guarantee full
> backwards compatibility forever.
> 
> The only alternative is a fully versioned interface, which we don't currently
> have for JSON, though it has been proposed, see
> <https://phabricator.wikimedia.org/T92961>.
> 
>> I assume that you are referring to the common practice of adding extra fields
>> in HTTP and email transport and header structures under the assumption that
>> these extra fields will just be passed on to downstream systems and then
>> silently ignored when content is displayed.
> 
> Indeed.
> 
>> I view these as special cases
>> where there is at least an implicit contract that no additional field will
>> change the meaning of the existing fields and data.
> 
> In the name of the Robustness Principle, I would consider this the normal 
> case,
> not the exception.
> 
>> When such contracts are
>> in place systems can indeed expect to see additional fields, and are 
>> permitted
>> to ignore these extra fields.
> 
> Does this count?
> <https://mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>

This email message is not a contract about how the Wikidata JSON data format
can change.  It instead describes how consumers of that (and other) data are
supposed to act.  My view is that without guarantees of what sort of changes
will be made to the Wikidata JSON data format, these are dangerous behaviours
for its consumers.

>> Because XML specifically states that the order of attributes is not
>> significant.  Therefore changes to the order of XML attributes is not 
>> changing
>> the encoding.
> 
> That's why I'm proposing to formalize the same kind of contract for us, see
> <https://phabricator.wikimedia.org/T142084>.

This contract guarantees that new fields will not change the interpretation of
pre-existing ones, which is strong, but I don't see where it guarantees that
the meaning of entire structures will not change, which is very weak.

Consider the rank field.  This doesn't change the interpretation of existing
fields.  However, it changes how the entire claim is to be considered.

>> Here is where I disagree.  As there is no contract that new fields in the
>> Wikidata JSON dumps are not breaking, clients need to treat all new fields as
>> potentially breaking and thus should not accept data with unknown fields.
> 
> While you are correct that there is no formal contract yet, the topic had been
> explicitly discussed before, in particular with Markus.
> 
>> I say this for any data, except where there is a contract that such 
>> additional
>> fields are not meaning-changing.
> 
> Quote me on it:
> 
> For wikibase serializations, additional fields are not meaning changing. 
> Changes
> to the format or interpretation of fields will be announced as a breaking 
> change.
> 
>>> Clients need to be prepared to encounter entity types and data types they 
>>> don't
>>> know. But they should also allow additional fields in any JSON object. We
>>> guarantee that extra fields do not impact the interpretation of fields they 
>>> know
>>> about - unless we have announced and documented a breaking change.
>>
>> Is this the contract that is going to be put forward?  At some time in the 
>> not
>> too distant future I hope that my company will be using Wikidata information
>> in its products.  This contract is likely to problematic for development
>> groups, who want some notion how long they have to prepare for changes that
>> can silently break their products.
> 
> This is indeed the gist of what I want to establish as a stability policy.
> Please comment on <https://phabricator.wikimedia.org/T142084>.
> 
> I'm not sure how this could be made less problematic. Even with a fully
> versioned JSON interface, available data types etc are a matter of
> configuration. All we can do is announce such changes, and advise consumers 
> that
> they can safely ignore unknown things.
> 
>

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-11 Thread Peter F. Patel-Schneider
My view is that any tool that imports external data has to be very cautious
about additions to the format of that data absent strong guarantees about
the effects of these additions.

Consider a tool that imports the Wikidata JSON dump, extracts base facts
from the dump, and outputs these facts in some other format (perhaps in RDF,
but it doesn't really matter what format).  This tool fits into the
"importing data from [an] external system using a generic exchange format".

My view is that this tool should be extremely cautious when it sees new data
structures or fields.  The tool should certainly not continue to output
facts without some indication that something is suspect, and preferably
should refuse to produce output under these circumstances.

What can happen if the tool instead continues to operate without complaint
when new data structures are seen?  Consider what would happen if the tool
was written for a version of Wikidata that didn't have rank, i.e., claim
objects did not have a rank name/value pair.  If ranks were then added,
consumers of the output of the tool would have no way of distinguishing
deprecated information from other information.

Of course this is an extreme case.  Most changes to the Wikidata JSON dump
format will not cause such severe problems.  However, given the current
situation with how the Wikidata JSON dump format can change, the tool cannot
determine whether any particular change will affect the meaning of what it
produces.  Under these circumstances it is dangerous for a tool that
extracts information from the Wikidata JSON dump to continue to produce
output when it sees new data structures.

This does make consuming tools sensitive to changes to the Wikidata JSON
dump format that are "non-breaking".  To overcome this problem there should
be a way for tools to distinguish changes to the Wikidata JSON dump format
that do not change the meaning of existing constructs in the dump from those
that can.  Consuming tools can then continue to function without problems
for the former kind of change.

Human-only signalling, e.g., an annoucement on some web page, is not
adequate because there is no guarantee that consuming tools will be changed
in response.


Peter F. Patel-Schneider
Nuance Communications

On 08/05/2016 11:56 AM, Stas Malyshev wrote:
> Hi!
> 
>> Consumers of data generally cannot tell whether the addition of a new field 
>> to
>> a data encoding is a breaking change or not.  Given this, code that consumes
>> encoded data should at least produce warnings when it encounters encodings
>> that it is not expecting and preferably should refuse to produce output in
>> such circumstances.  Producers of data thus should signal in advance any
>> changes to the encoding, even if they know that the changes can be safely 
>> ignored.
> 
> I don't think this approach is always warranted. In some cases, yes, but
> in case where you importing data from external system using a generic
> data exchange format like JSON, I don't think this is warranted. This
> will only lead to software being more brittle without any additional
> benefit to the user. Formats like JSON allow to easily accommodate
> backwards-compatible incremental change, so there's no reason not to use
> it.
> 
>> I would view software that consumes Wikidata information and silently ignores
>> fields that it is not expecting as deficient and would counsel against using
>> such software.
> 
> I think this approach is way too restrictive. Wikidata is a database
> that does not have fixed schema, and even its underlying data
> representations are not yet fixed, and probably won't be completely
> fixed for a long time. Having software break each time a field is added
> would lead to a software that breaks often and does not serve its users
> well. You need also to consider that Wikidata is a huge database with a
> very wide mission, and many users may not be interested in all the
> details of the data representation, but only in some aspects of it.
> Having the software refuse to operate on the data that is relevant to
> the user because some part that is not relevant to the user changed does
> not look like the best approach to me.
> 
> For Wikidata specifically I think better approach would be to ignore
> fields, types and other structures that are not known to the software,
> provided that ones that are known do not change their semantics with
> additions - and I understand that's the promise from Wikidata (at least
> excepting cases of specially announced BC-breaking changes). Maybe
> inform the user that some information is not understood and thus may be
> not available, but not refuse to function completely.
> 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Peter F. Patel-Schneider
I side firmly with Markus here.

Consumers of data generally cannot tell whether the addition of a new field to
a data encoding is a breaking change or not.  Given this, code that consumes
encoded data should at least produce warnings when it encounters encodings
that it is not expecting and preferably should refuse to produce output in
such circumstances.  Producers of data thus should signal in advance any
changes to the encoding, even if they know that the changes can be safely 
ignored.

I would view software that consumes Wikidata information and silently ignores
fields that it is not expecting as deficient and would counsel against using
such software.

Peter F. Patel-Schneider
Nuance Communications

PS:  JSON is a particularly problematic encoding for data because many aspects
of the data that a particular JSON text is meant to encode are left
unspecified by the JSON standards.

On 08/05/2016 05:04 AM, Daniel Kinzler wrote:
> Hi Markus!
> 
> You are asking use to better communicate changes to our serialization, even if
> it's not a breaking change according to the spec. I agree we should do that. 
> We
> are trying to improve our processes to achieve this.
> 
> Can we ask you in return to try to make your software more robust, by not 
> making
> unwarranted assumptions about the serialization format?
> 
> 
> With regards to communicating more - it's very hard to tell which changes 
> might
> break something for someone. For instance, some software might rely on the 
> order
> of fields in a JSON object, even though JSON says this is unspecified, just 
> like
> you rely on no fields being added, even though there is no guarantee about 
> this.
> Similarly, some software might rely on non-ascii characters being represented 
> as
> unicode escape sequences, and will break if we use the more compact utf-8. Or
> they may break on changes whitespace. Who knows. We can not possibly know what
> kind of change will break some 3rd party software.
> 
> I don't think announcing any and all changes is feasible. So I think an 
> official
> policy about what we announce can be useful. Something like "This is what we
> consider a breaking change, and we will definitely announce it. And these are
> some kinds of changes we will also communicate ahead of time. And these are 
> some
> things that can happen unannounced."
> 
> You are right that policies don't change the behavior of software. But perhaps
> they can change the behavior of programmers, by telling them what they can 
> (and
> can't) safely rely on.
> 
> 
> It boils down to this: we can try to be more verbose, but if you make
> assumptions beyond the spec, things will break sooner or later. Writing robust
> software requires more time and thought initially, but it saves a lot of
> headaches later.
> 
> -- daniel
> 
> Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch:
>> Daniel,
>>
>> You present arguments on issues that I would never even bring up. I think we
>> fully agree on many things here. Main points of misunderstanding:
>>
>> * I was not talking about the WMDE definition of "breaking change". I just 
>> meant
>> "a change that breaks things". You can define this term for yourself as you 
>> like
>> and I won't argue with this.
>>
>> * I would never say that it is "right" that things break in this case. It's
>> annoying. However, it is the standard behaviour of widely used JSON parsing
>> libraries. We won't discuss it away.
>>
>> * I am not arguing that the change as such is bad. I just need to know about 
>> it
>> to fix things before they break.
>>
>> * I am fully aware of many places where my software should be improved, but I
>> cannot fix all of them just to be prepared if a change should eventually 
>> happen
>> (if it ever happens). I need to know about the next thing that breaks so I 
>> can
>> prioritize this.
>>
>> * The best way to fix this problem is to annotate all Jackson classes with 
>> the
>> respective switch individually. The global approach you linked to requires 
>> that
>> all users of the classes implement the fix, which is not working in a 
>> library.
>>
>> * When I asked for announcements, I did not mean an information of the type 
>> "we
>> plan to add more optional bits soonish". This ancient wiki page of yours that
>> mentions that some kind of change should happen at some point is even more
>> vague. It is more helpful to learn about changes when you know how they will
>> look and when they will happen. My assumption is that this is a "low cost"
>> improvement that is not too muc

Re: [Wikidata] "Implementing" OWL RL in SPARQL (Was: qwery.me - simpler queries for wikidata)

2015-11-13 Thread Peter F. Patel-Schneider


On 11/13/2015 01:21 AM, Markus Krötzsch wrote:
> On 12.11.2015 22:09, Peter F. Patel-Schneider wrote:
>> On 11/12/2015 09:10 AM, Markus Krötzsch wrote:
>> [...]
>>> On the other hand, it is entirely possible to implement correct OWL QL 
>>> (note:
>>> *QL* not *RL*) reasoning in SPARQL without even using "rules" that need any
>>> recursive evaluation [3]. This covers all of RDFS, and indeed some of the
>>> patterns in these queries are quite well-known to Wikidata users too (e.g.,
>>> using "subclassOf*" in a query). Depending on how much of OWL QL you want to
>>> support, the SPARQL queries you get in this case are more or less simple. 
>>> This
>>> work also gives arguments as to why this style of SPARQL-based 
>>> implementation
>>> does (most likely) not exist for OWL RL [3].
>>
>> Does OWL QL cover *all* of RDFS, even things like subproperties of
>> rdfs:subclassOf and rdfs:subPropertyOf?
> 
> No, surely not. What I meant is the RDFS-fragment of OWL DL here (which is
> probably what RDFS processors are most likely to implement, too).
> 
> I think I recall you showing P-hardness of RDFS proper a while ago, which
> would obviously preclude translation into single SPARQL 1.1 queries (unless
> NL=P).

Yes indeed, and hence my query.

> 
> Markus
> 
> 

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] "Implementing" OWL RL in SPARQL (Was: qwery.me - simpler queries for wikidata)

2015-11-13 Thread Peter F. Patel-Schneider
On 11/12/2015 09:10 AM, Markus Krötzsch wrote:
[...]
> On the other hand, it is entirely possible to implement correct OWL QL (note:
> *QL* not *RL*) reasoning in SPARQL without even using "rules" that need any
> recursive evaluation [3]. This covers all of RDFS, and indeed some of the
> patterns in these queries are quite well-known to Wikidata users too (e.g.,
> using "subclassOf*" in a query). Depending on how much of OWL QL you want to
> support, the SPARQL queries you get in this case are more or less simple. This
> work also gives arguments as to why this style of SPARQL-based implementation
> does (most likely) not exist for OWL RL [3].

Does OWL QL cover *all* of RDFS, even things like subproperties of
rdfs:subclassOf and rdfs:subPropertyOf?

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Peter F. Patel-Schneider
On 10/28/2015 12:08 PM, Tom Morris wrote:
[...]
> Going back to Ben's original problem, one tool that Freebase used to help
> manage the problem of incompatible type merges was a set of curated sets of
> incompatible types [5] which was used by the merge tools to warn users that
> the merge they were proposing probably wasn't a good idea.  People could
> ignore the warning in the Freebase implementation, but Wikidata could make it
> a hard restriction or just a warning.
> 
> Tom

I think that this idea is a good one.  The incompatibility information  could
be added to classes in the form of "this class is disjoint from that other
class".  Tools would then be able to look for this information and produce
warnings or even have stronger reactions to proposed merging.

I'm not sure that using P1889 "different from" is going to be adequate.  What
links would be needed?  Just between a gene and its protein?  That wouldn't
catch merging a gene and a related protein.  Between all genes and all
proteins?  It seems to me that this is better handled at the class level.

peter


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Peter F. Patel-Schneider
I think that using P1889 in this way is abusing its meaning.

Q16657504 P1889 Q6525093 doesn't mean that Q16657504 should not be merged with
some other human item in Wikidata.


peter


On 10/28/2015 03:41 PM, Magnus Manske wrote:
> I fear my games may contribute to both problems (merging two items, and adding
> a sitelink to the wrong item). Both are facilitated by identical
> names/aliases, and sometimes it's hard to tell that a pair is meant to be
> different, especially if you don't know about the intricate structures of the
> respective knowledge domain.
> 
> An item-specific, but somewhat heavy-handed approach would be to prevent
> merging of any two items where at least one has P1889, no matter what it
> specifically points to. At least, give a warning that an item is
> "merge-protected", and require an additional override for the merge.
> 
> If that is acceptable, it would be easy for me to filter all items with P1889,
> from the merge game at least.
> 
> On Wed, Oct 28, 2015 at 8:50 PM Peter F. Patel-Schneider
> <pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:
> 
> On 10/28/2015 12:08 PM, Tom Morris wrote:
> [...]
> > Going back to Ben's original problem, one tool that Freebase used to 
> help
> > manage the problem of incompatible type merges was a set of curated 
> sets of
> > incompatible types [5] which was used by the merge tools to warn users 
> that
> > the merge they were proposing probably wasn't a good idea.  People could
> > ignore the warning in the Freebase implementation, but Wikidata could
> make it
> > a hard restriction or just a warning.
> >
> > Tom
> 
> I think that this idea is a good one.  The incompatibility information  
> could
> be added to classes in the form of "this class is disjoint from that other
> class".  Tools would then be able to look for this information and produce
> warnings or even have stronger reactions to proposed merging.
> 
> I'm not sure that using P1889 "different from" is going to be adequate.  
> What
> links would be needed?  Just between a gene and its protein?  That 
> wouldn't
> catch merging a gene and a related protein.  Between all genes and all
> proteins?  It seems to me that this is better handled at the class level.
> 
> peter
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] diseases as classes in Wikidata - was Re: An Ambitious Wikidata Tutorial

2015-10-19 Thread Peter F. Patel-Schneider
On 10/18/2015 01:59 PM, Stas Malyshev wrote:

> > [Emw]
>
> Hi!
>
> > The community-defined meaning of /subclass of/ (P279) is that of
> > rdfs:subClassOf [1].  Similarly, the community-defined meaning of
> > /instance of/ (P31) is that of rdf:type [2, 3].
>
> Are you sure [that] is always correct? AFAIK there are some
> specific rules and meanings in OWL that classes should adhere to,
> also same thing can not be an individual and a class, and others
> (not completely sure of the whole list, as I don't have enough
> background in RDF/OWL). But I'm not sure existing data actually
> follows that.

OWL does not currently allow classes to be directly treated as
individuals.  This is more of an engineering decision than a
philosophical one, however.  In RDFS classes are also individuals.

> > There are some open problems with how to handle qualifiers on
> > /instance of/ and /subclass of/ in RDF/OWL exports of P31 as
> > rdf:type and P279 as rdfs:subClassOf, but that does not negate
> > the community's decision to tie its two most basic membership
> > properties to those W3C standard properties.  In the current
> > RDF/OWL exports that follow the community

> I'm not sure I understand how that works in practice. I.e., if we
> say that P31 *is* rdf:type, then it can't be qualified in RDF/OWL
> and we can not represent part (albeit small, qualified properties
> are about 0.2% of all such properties) of our data.
>
> I mean, we can certainly have data sets which include P31
> statements from the data translated to rdf:type unless they have
> qualifiers, and that can be very useful pragmatically, no question
> about it. But can we really say P31 is the same as rdf:type and
> use it whenever we choose to represent Wikidata data as RDF? I'm
> not sure about that.

Nor am I.

> > For example, pizza (https://www.wikidata.org/wiki/Q177) is
> > currently modeled as an instance of food and (transitively) a
> > subclass of food.

> Here we have another practical issue - if we adhere to the strict
> notion that pizza is only a subclass, then we would practically
> never have any instances in the database for wide categories of
> things. I.e. since a particular food item is rarely notable enough
> to be featured in Wikidata, no food would have instances. It may
> be formally correct but I'm afraid it's not like most people think
> - for most people, pizza is a food, not a "subclass of food".

Well pizza is a kind of food, and a kind that is important enough to
get a name in some languages.  I agree that it would be nice,
however, to be able to model the way that we think that people
think, and thus be able to make pizza an instance of some food
class instead of requiring that it be (only) a subclass of some
general class.

> Same with chemistry - as virtually no
> actual physical chemical compound (as in "this brown liquid in my
> test tube I prepared this morning by mixing contents of those
> three other test tubes") of would be notable enough to gain entry
> in Wikidata, [nearly] nothing in chemistry would ever be an
> instance. Theoretically it may be sound, but practically I'm not
> sure it would work well, even more - that it is *already* what the
> consensus on Wikidata is.

I have come around to the position that it is preferrable to model
these sort of domains using multiple levels of the class hierarchy.
For food, there would be a class (possibly called food) whose
instances are those things that are actually eaten (like the pizza I
ate in Bethlehem last week).  There would also be a class (possibly
also called food, but maybe food type) whose instances are the
(notable?) classes of food (like pizza, but maybe also like bad
pizza from a hole-in-the-wall restaurant).  This lets you have your
cake and describe it too.

I have also come around to the position that this situation is very
common.  Also, people seem to be generally capable of working with
such modelling, at least informally in their heads.

However, this modelling methodology needs to be described to users,
as even things that people do well internally can cause problems
when they are being externalized.  For example, it would be a
problem if users put things in the wrong place (pizza as an instance
of the non-food-type food) or make other modelling errors.  There
also should be tool support, for exammple to ensure that all
instances of the food-type food are subclasses of the non-food-type
food (and maybe vice-versa).

But what else can be done?  Every other approach that I have seen
has what I consider to be worse problems.

> Stas Malyshev
> smalys...@wikimedia.org

Peter F. Patel-Schneider

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] An Ambitious Wikidata Tutorial

2015-10-16 Thread Peter F. Patel-Schneider
It's very pleasant to hear from someone else who thinks of Wikidata as a
knowledge base (or at least hopes that Wikidata can be considered as a
knowledge base).  Did you get any pushback on this or on your stated Wikidata
goal of structuring the sum of all human knowledge?

Did you get any pushback on your section on classification in Wikidata?  It
seems to me that some of that is rather controversial in the Wikidata
community.  I was a bit surprised to see class reasoning used on diseases.
This depends on a particular modelling methodology.

peter


On 10/12/2015 11:47 AM, Emw wrote:
> Hi all,
> 
> On Saturday, I facilitated a workshop at the U.S. National Archives entitled
> "An Ambitious Wikidata Tutorial" as part of WikiConference USA 2015. 
> 
> Slides are available at:
> http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial
> https://commons.wikimedia.org/wiki/File:An_Ambitious_Wikidata_Tutorial.pdf


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-30 Thread Peter F. Patel-Schneider
On 09/29/2015 08:01 AM, Daniel Kinzler wrote:
> Am 29.09.2015 um 11:05 schrieb Thomas Douillard:
>> No it's not, because of the "undoing" problem. A user can't delete a 
>> statement
>> assuming this will be enough as he will not be explicit that the statement is
>> bot added and implied by other statements, as opposed as a statement 
>> explicitely
>> inferred by Wikibase and marked explicitely as such in the UI. If Wikibase
>> tracks the root explicit statements used to make the inference, they could be
>> exposed in the UI as well to tell the user what he might have to do to 
>> correct
>> the mistake (closer to or) at the actual root.
> 
> I agree: if we had built in inference done Just Right (tm), with everything
> editable and visible in all the right places, that would be great. But this
> would add a lot of complexity to the system, and would take a lot of resources
> to implement. It would also diverge quite a bit from the classic idea of a 
> wiki,
> potentially cause community issues.
> 
> The approach using bots was never ideal, but is still hugely successful on
> wikipedia. The same seems to be the case here. Also don't underestimate the 
> fact
> that the community has a lot of experience with bots, but is generally very
> skeptical against automatic content (even just including information from
> wikidata on wikipedia pages).
> 
> So, while bots are not ideal, and a better solution is conceivable, I think 
> bots
> as the optimal solution for the moment. We should not ignore the issues that
> exist with bots, and we should not lose sight of other options. But I think we
> should focus development on more urgent things, like a better system for 
> source
> references, or unit conversion, or better tools for constraints, or for re-use
> on wikipedia.

I also strongly agree that inference-making tools should record their
premises.  There are lots of excellent reasons to do this recording, including
showing editors where changes need to be made to remove the inferred claim.
Inference-making bots that do not record how a claim was inferred are even
worse than an inferential system that does not do so, as determining which bot
made a particular inference is harder than determining which part of an
inferential system sanctions a particular inference.


What is the difference between a system of inference-making bots that record
their premises and an inferential system that records its premises?   In some
sense, not much.  I would thus argue that an inferential system is no more
complex than a set of inference-making bots.

However, an inferential system is not limited to the implementation techniques
that are needed in a bot system.   It can, for example, only perform some
inferences on an as-needed basis.  An inferential system also can be analyzed
as a whole, something that is quite difficult with a bot system.

I would argue that inference-making bots should be considered only as a
stop-gap measure, and that a different mechanism should be considered for
making inferences in Wikidata.  I am not arguing for Inference done Just Right
(tm).  It is not necessary to get inference perfect the first time around.
All that is required is an inference mechanism that is examinable and maybe
overridable.


Peter F. Patel-Schneider
Nuance Communications




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-29 Thread Peter F. Patel-Schneider
On 09/28/2015 11:24 PM, Federico Leva (Nemo) wrote:
> Peter F. Patel-Schneider, 28/09/2015 22:27:
>>> >I'm aguing against making such inference part of wikibase/wikidata core
>>> >functionality, and hiding it's working ("magic").
>>> >
>>> >However, I very much hope for a whole ecosystem of tools that apply and
>>> use such
>>> >inference, and make the results obvious to users, both integrated with
>>> >wikidata.org and outside.
>>
>> Has anyone argued for performing inference and then hiding that it happened?
> 
> Some did, I think. :) Anything that doesn't create a recentchanges entry is
> "hiding that it happened".
> 
> Nemo
> 

Citation please.


peter


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread Peter F. Patel-Schneider
On 09/28/2015 08:12 AM, Daniel Kinzler wrote:
> Am 28.09.2015 um 16:43 schrieb Thomas Douillard:
>> Daniel Wrote:
>>> (*) This follows the principle of "magic is bad, let people edit". Allowing
>>> inconsistencies means we can detect errors by finding such inconsistencies.
>>> Automatically enforcing consistency may lead to errors propagating out of 
>>> view
>>> of the curation process. The QA process on wikis is centered around edits, 
>>> so
>>> every change should be an edit. Using a bot to fill in missing "reverse" 
>>> links
>>> follows this idea. The fact that you found an issue with the data because 
>>> you
>>> saw a bot do an edit is an example of this principle working nicely.
>>
>> That might prove to become a worser nightmare than the magic one ... It's 
>> seems
>> like refusing any kind of automation because it might surprise people for the
>> sake of exhausting them to let them do a lot of manual work.
> 
> I'm not arguing against "any" kind of automation. I'm arguing against
> "invisible" automation baked into the backend software. We(*) very much
> encourage "visible" automation under community control like bots and other
> (semi-)automatic import tools like WiDaR.
> 
> -- daniel
> 
> 
> (*) I'm part of the wikidata developer team, not an active member of the
> community. I'm primarily speaking for myself here, from my personal experience
> as a wikipedia and common admin. I know from past discussions that "bots over
> magic" is considered Best Practice among the dev team, and I believe it's also
> the approach preferred by the Wikidata community, but I cannot speak for them.

I'm not sure what you are arguing against here.

Are you arguing against any tool that makes inferences combining multiple
pieces of data in Wikidata?  Would you also argue against this if the inferred
information is flagged in some way?

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread Peter F. Patel-Schneider
On 09/28/2015 07:25 AM, Daniel Kinzler wrote:
> Am 28.09.2015 um 16:14 schrieb Peter F. Patel-Schneider:
>> I worry about this way of specializing properties.   How are people, and
>> particularly programs, going to be able to find out that a qualifier is
>> needed, which qualifier it is, and how it is to be used, or which broad
>> property is to be used for a specific purpose?
> 
> This problem exists either way: either you have to know which specific 
> property
> to use, or you have to know how to qualify the broader property. In both 
> cases,
> human readable documentation is the way to find out.

I agree that finding the right thing to use is not easy.

However, I think that a uniform search space is better than a non-uniform one.
  I would much prefer to look through a collection of properties than a
collection of properties and qualifiers.  If I am writing a tool to help the
process, I would much prefer to display a collection of properties than a
collection of properties plus qualifiers.

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread Peter F. Patel-Schneider
On 09/28/2015 08:30 AM, Daniel Kinzler wrote:
> Am 28.09.2015 um 17:27 schrieb Peter F. Patel-Schneider:
>> Are you arguing against any tool that makes inferences combining multiple
>> pieces of data in Wikidata?  Would you also argue against this if the 
>> inferred
>> information is flagged in some way?
> 
> I'm aguing against making such inference part of wikibase/wikidata core
> functionality, and hiding it's working ("magic").
> 
> However, I very much hope for a whole ecosystem of tools that apply and use 
> such
> inference, and make the results obvious to users, both integrated with
> wikidata.org and outside.

Has anyone argued for performing inference and then hiding that it happened?

One problem that I see with Wikidata at the moment is that it is not obvious
what inferences should or could be done.  There is no theory of knowledge that
stands behind Wikidata.  It seems to me that in the absence of such a theory
people are building bots that do some checks on the data in Wikipedia in an
attempt to check some of the inferences that would be sanctioned by such a
theory.  However, I do not see any determination that the bots are covering
those checks that should be made.  I guess that now people are also building
bots that make some of the inferences that would be sanctioned by a theory of
knowledge for Wikidata.  Again, however, there doesn't seem to any
determination that these bots are making correct inferences or that they are
covering a group of inferences that should be made.

I view this situation as inferior to an implementation of an integrated set of
inferences for Wikidata.


Admittedly, coming up with a knowledge theory for Wikidata is not going to be
easy.  It is much easier to just write a bot that does something that might be
useful.


peter


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] next Wikidata office hour

2015-09-24 Thread Peter F. Patel-Schneider


On 09/24/2015 10:59 AM, Lydia Pintscher wrote:
> On Thu, Sep 24, 2015 at 7:54 PM, Tom Morris  wrote:
>> Thanks!  Is there any more information on the issue with MusicBrainz?
>>
>> 17:26:27  sjoerddebruin: yes, we went for MusicBrainz first,
>> but it turned out to be impractical. you basically have to run their
>> software in order to use their dumps
>>
>>
>> MusicBrainz was a major source of information for Freebase, so they appear
>> to have been able to figure out how to parse the dumps (and they already
>> have the MusicBrainz & Wikipedia IDs correlated).
>>
>> Is there more detail, perhaps in a bug somewhere?
> 
> The issue is that they do offer dumps but you need to set up your own
> musicbrainz server to really use it. This was too time-intensive and
> complicated for the students to make progress on during their project.
> Because of this they decided to instead opt for another dataset
> instead to get started. In the future Musicbrainz should still get
> done. If anyone wants to work on adding more datasets to the tool
> please let me know.
> 
> 
> Cheers
> Lydia
> 

This is to add MusicBrainz to the primary source tool, not anything else?

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] next Wikidata office hour

2015-09-24 Thread Peter F. Patel-Schneider
On 09/24/2015 11:31 AM, Tom Morris wrote:
> On Thu, Sep 24, 2015 at 2:18 PM, Peter F. Patel-Schneider
> <pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:
> 
> On 09/24/2015 10:59 AM, Lydia Pintscher wrote:
> > On Thu, Sep 24, 2015 at 7:54 PM, Tom Morris <tfmor...@gmail.com 
> <mailto:tfmor...@gmail.com>> wrote:
> >> Thanks!  Is there any more information on the issue with MusicBrainz?
> >>
> >> 17:26:27  sjoerddebruin: yes, we went for MusicBrainz 
> first,
> >> but it turned out to be impractical. you basically have to run their
> >> software in order to use their dumps
> >>
> >>
> >> MusicBrainz was a major source of information for Freebase, so they 
> appear
> >> to have been able to figure out how to parse the dumps (and they 
> already
> >> have the MusicBrainz & Wikipedia IDs correlated).
> >>
> >> Is there more detail, perhaps in a bug somewhere?
> >
> > The issue is that they do offer dumps but you need to set up your own
> > musicbrainz server to really use it. This was too time-intensive and
> > complicated for the students to make progress on during their project.
> > Because of this they decided to instead opt for another dataset
> > instead to get started. In the future Musicbrainz should still get
> > done. If anyone wants to work on adding more datasets to the tool
> > please let me know.
> >
> >
> > Cheers
> > Lydia
> >
> 
> This is to add MusicBrainz to the primary source tool, not anything else?
> 
> 
> It's apparently worse than that (which I hadn't realized until I re-read the
> transcript).  It sounds like it's just going to generate little warning icons
> for "bad" facts and not lead to the recording of any new facts at all.
> 
> 17:22:33 we'll also work on getting the extension deployed that 
> will help with checking against 3rd party databases
> 17:23:33 the result of constraint checks and checks against 3rd 
> party databases will then be used to display little indicators next to a 
> statement in case it is problematic
> 17:23:47 i hope this way more people become aware of issues and 
> can help fix them
> 17:24:35 Do you have any names of databases that are 
> supported? :)
> 17:24:59 sjoerddebruin: in the first version the german national 
> library. it can be extended later
> 
> 
> I know Freebase is deemed to be nasty and unreliable, but is MusicBrainz
> considered trustworthy enough to import directly or will its facts need to be
> dripped through the primary source soda straw one at a time too?
> 
> Tom


I wonder how these warnings will work.  I can see lots and lots of warnings
due to minor variations in names of artists.

I do agree that MusicBrainz data should pass the Wikidata bar, as the data in
MusicBrainz appear to me to be noteworthy and related to information in some
Wiki (and true, although this is not part of the Wikidata bar as far as I know).

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider
I am a relative [sic] outsider to Wikidata and I just tried to answer this
question by looking at wikidata.

It turns out that there is information in Wikidata that indicates that
https://www.wikidata.org/wiki/Property:P22 (father) is only to be used on
people.  Look at https://www.wikidata.org/wiki/Property_talk:P22, where both
the type and the value type are person (Q215627), fictional character
(Q95074).  Similar restrictions are in place for
https://www.wikidata.org/wiki/Property:P1038 (relative).

So I would say that, no, you should not use these properties on horses.

Whether this is a good thing or not is a separate matter.  I do note that
there do not appear to be any Wikidata properties that can be used for
parent-offspring relationships for horses.  Neither
https://www.wikidata.org/wiki/Property:P22 (father) nor
https://www.wikidata.org/wiki/Property:P1038 (relative) have super-properties.

peter

On 08/26/2015 04:45 AM, Ole Palnatoke Andersen wrote:
 I've just completed #100wikidays, and my 100th article was about a
 horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
 grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
 use the same properties as for humans?
 
 We also have https://www.wikidata.org/wiki/Q12331109 and
 https://www.wikidata.org/wiki/Q12338810, who were father and son.
 Again: Do we have animal properties, or do we use the same as for
 humans?
 
 Regards,
 Ole
 
 On Mon, Aug 24, 2015 at 10:55 PM, Andrew Gray andrew.g...@dunelm.org.uk 
 wrote:
 Having gone and written the RFC
 (https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Merging_relationship_properties)
 I've just discovered that we *did* have this discussion in 2013:

 https://www.wikidata.org/w/index.php?title=Wikidata%3AProperties_for_deletiondiff=44470851oldid=44465708

 - and it was suggested we come back to it after Phase III. I think
 the existing state of arbitrary access should be able to solve this
 problem, so I've added some notes about this.

 Comments welcome; I'll circulate notifications onwiki tonight.

 Andrew.

 On 24 August 2015 at 14:02, Lukas Benedix bene...@zedat.fu-berlin.de wrote:
 +1 for genderless family relationship properties.

 Lukas

 Hi all,

 Thanks again for your comments. It looks like:

 a) there's interest in simplifying this;

 b) creating automatic inferences is possibly desirable but will need a
 lot of work and thought.

 I'll put together an RFC onwiki about merging the gendered
 relationship properties, which will address the first part of the
 issue, and we can continue to think about how best to approach the
 second.

 Andrew.

 On 17 August 2015 at 12:29, Andrew Gray andrew.g...@dunelm.org.uk wrote:
 Hi all,

 I've recently been thinking about how we handle family/genealogical
 relationships in Wikidata - this is, potentially, a really valuable
 source of information for researchers to have available in a
 structured form, especially now we're bringing together so many
 biographical databases.

 We currently have the following properties to link people together:

 * spouses (P26) and cohabitants (P451) - not gendered
 * parents (P22/P25) and step-parents (P43/P44) - gendered
 * siblings (P7/P9) - gendered
 * children (P40) - not gendered (and oddly no step-children?)
 * a generic related to (P1038) for more distant relationships

 There's two big things that jump out here.

 ** First, gender. Parents are split by gender while children are not
 (we have mother/father not son/daughter). Siblings are likewise
 gendered, and spouses are not. These are all very early properties -
 does anyone remember how we got this way?

 This makes for some odd results. For example, if we want to using our
 data to identify all the male-line *descendants* of a person, we have
 to do some complicated inference from [P40 + target is male]. However,
 to identify all the male-line *ancestors*, we can just run back up the
 P22 chain. It feels quite strange to have this difference, and I
 wonder if we should standardise one way or the other - split P40 or
 merge the others.

 In some ways, merging seems more elegant. We do have fairly good
 gender metadata (and getting better all the time!), so we can still do
 gender-specific relationship searches where needed. It also avoids
 having to force a binary gender approach - we are in the odd position
 of being able to give a nuanced entry in P21 but can only say if
 someone is a sister or brother.

 ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
 by definition symmetric. If A has P26:B, then B should also have
 P26:A. The gendered cases are a little more complicated, as if A has
 P40:B, then B has P22:A or P25:A, but there is still a degree of
 symmetry - one of those must be true.

 However, Wikidata doesn't really help us make use of this symmetry. If
 I list A as spouse of B, I need to add (separately) that B is spouse
 of A. If they have four children C, D, E, and F, this gets very
 complicated 

Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider


On 08/26/2015 06:16 AM, Svavar Kjarrval wrote:
 On mið 26.ágú 2015 11:45, Ole Palnatoke Andersen wrote:
 I've just completed #100wikidays, and my 100th article was about a
 horse: https://www.wikidata.org/wiki/Q12003911 That horse is the
 grandfather of https://www.wikidata.org/wiki/Q20872428, but should I
 use the same properties as for humans?

 We also have https://www.wikidata.org/wiki/Q12331109 and
 https://www.wikidata.org/wiki/Q12338810, who were father and son.
 Again: Do we have animal properties, or do we use the same as for
 humans?

 P21 is a subclass of P31 with Q18608871 which indicates in machine
 readable interpretation that it is about the gender of people, yet the
 descriptions assume items can be associated with P21 to include gender
 of animals. Yeah, I can understand the confusion. :/
 
 - Svavar Kjarrval
 
 

I don't think that P21 (https://www.wikidata.org/wiki/Property:P21, sex or
gender) is a subclass of P31 (https://www.wikidata.org/wiki/Property:P31,
instance of).  Properties aren't subclasses in general.

Perhaps you meant to talk about https://www.wikidata.org/wiki/Property:P21
(sex or gender) being related via (https://www.wikidata.org/wiki/Property:P31
(instance of) to https://www.wikidata.org/wiki/Q18608871 (Wikidata property
for items about people).   This indicates that the property should only be
used on people, even though the description of the property itself talks about
its use on animals.

It appears that Wikidata is not very consistent internally.

peter





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Properties for family relationships in Wikidata

2015-08-26 Thread Peter F. Patel-Schneider
, it is called a reasoner.  The design of a reasoner would very likely be
one result of the sort of work described above, but without such work it is
very hard to figure out just what is supposed to be done in any except the
simple cases.

 - Svavar Kjarrval

Peter F. Patel-Schneider
Nuance Communications


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-07-01 Thread Peter F. Patel-Schneider
I would find this discussion easier to follow if the Wikidata identifiers
for the various classes and properties were mentioned, and there were
pointers to relevant documentation.

The only  Wikidata class or property that I could easily find is Q205892.
It's discussion page, https://www.wikidata.org/wiki/Talk:Q205892, mentions a
bit about conversion, but nothing about this issue.

The page segment that is supposed to be being used for discussion,
https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup,
does not have any pointers to any classes, properties, or documentation.

Even the very nice email from Markus that gives numbers does not provide any
information on where the numbers come from.

Please

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-07-01 Thread Peter F. Patel-Schneider
Thanks.  That helps a lot.  Is that the way that things are going to be done
in the future, i.e., dates will be stored using the specified calendar model
instead of being converted?

peter


On 07/01/2015 10:52 AM, Denny Vrandečić wrote:
 Peter,
 
 you might be looking for this:
 
 https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times
 
 Cheers,
 Denny
 
 On Wed, Jul 1, 2015 at 9:48 AM Peter F. Patel-Schneider
 pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:
 
 Thanks.
 
 This helps in finding out how to reproduce the numbers.
 
 However, I'm still confused as to how these bits of data are part of the
 Wikidata data/knowledge model.  Where is the description of
 getPreferredCalendarModel, for example?
 
 
 http://javadox.com/org.wikidata.wdtk/wdtk-datamodel/0.1.0/org/wikidata/wdtk/datamodel/interfaces/TimeValue.html
  Is a *partial* description of what is going on.  Changes to this document
 would be somewhat useful.  However, what I'm really looking for is a
 description of how time works in Wikidata.
 
 peter
 
 PS:  I note that there are lots of aspects of TimeValue that are only
 suitable for the Gregorian and Julian calendars.
 
 
 
 On 07/01/2015 09:24 AM, Markus Krötzsch wrote:
  On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ...
 
  Even the very nice email from Markus that gives numbers does not
  provide any information on where the numbers come from.
 
  I just ran a simple Java program based on Wikidata Toolkit to count the
  date values. The features I used for counting are all part of the data
  (concretely I accessed: year number, precision, and calendar model). I
  used the JSON dump of 22 June 2015. The program counted all dates that
  occur in any place (main values of statements, qualifiers, and
  references). No other special processing was done.
 
  Below is the main code snippet that did the counting, in case my
  description was too vague. If you want to get your own numbers, it does
  not require much (I just modified one of the example programs in 
 Wikidata
  Toolkit that gathers general statistics). Running the code took about
  25min on my laptop (the initial dump download took longer though). The
  SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return
  useful counts if it does not time out on the very large numbers. It uses
  life data.
 
  Best regards,
 
  Markus
 
 
  // after determining that snak is of appropriate type: String cm =
  ((TimeValue) ((ValueSnak) snak).getValue())
  .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm))
  { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm))
  { this.countJulDates++; } else { System.err.println(Weird calendar
  model:  + ((ValueSnak) snak).getValue()); }
 
  if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() =
  TimeValue.PREC_MONTH) { return; }
 
  long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if
  (year = 1923) { this.countModernDates++; } else if (year = 1753) {
  this.countAlmostModernDates++; } else if (year = 1582) {
  this.countTransitionDates++; } else { this.countOldenDates++; }
 
 
  ___ Wikidata mailing list
  Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata
 
 
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata
 

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-07-01 Thread Peter F. Patel-Schneider
Thanks.

This helps in finding out how to reproduce the numbers.

However, I'm still confused as to how these bits of data are part of the
Wikidata data/knowledge model.  Where is the description of
getPreferredCalendarModel, for example?

http://javadox.com/org.wikidata.wdtk/wdtk-datamodel/0.1.0/org/wikidata/wdtk/datamodel/interfaces/TimeValue.html
 Is a *partial* description of what is going on.  Changes to this document
would be somewhat useful.  However, what I'm really looking for is a
description of how time works in Wikidata.

peter

PS:  I note that there are lots of aspects of TimeValue that are only
suitable for the Gregorian and Julian calendars.



On 07/01/2015 09:24 AM, Markus Krötzsch wrote:
 On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ...
 
 Even the very nice email from Markus that gives numbers does not
 provide any information on where the numbers come from.
 
 I just ran a simple Java program based on Wikidata Toolkit to count the
 date values. The features I used for counting are all part of the data
 (concretely I accessed: year number, precision, and calendar model). I
 used the JSON dump of 22 June 2015. The program counted all dates that
 occur in any place (main values of statements, qualifiers, and
 references). No other special processing was done.
 
 Below is the main code snippet that did the counting, in case my
 description was too vague. If you want to get your own numbers, it does
 not require much (I just modified one of the example programs in Wikidata
 Toolkit that gathers general statistics). Running the code took about
 25min on my laptop (the initial dump download took longer though). The
 SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return
 useful counts if it does not time out on the very large numbers. It uses
 life data.
 
 Best regards,
 
 Markus
 
 
 // after determining that snak is of appropriate type: String cm =
 ((TimeValue) ((ValueSnak) snak).getValue()) 
 .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm))
 { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm))
 { this.countJulDates++; } else { System.err.println(Weird calendar
 model:  + ((ValueSnak) snak).getValue()); }
 
 if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() = 
 TimeValue.PREC_MONTH) { return; }
 
 long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if
 (year = 1923) { this.countModernDates++; } else if (year = 1753) { 
 this.countAlmostModernDates++; } else if (year = 1582) { 
 this.countTransitionDates++; } else { this.countOldenDates++; }
 
 
 ___ Wikidata mailing list 
 Wikidata@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-l] OpenStreetMap + Wikidata

2015-04-23 Thread Peter F. Patel-Schneider
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

How so?  Oh, because Wikidata is CC0 and the Open Street Map database is
ODbL, which is somewhat like CC BY-SA.  I don't think that that follows,
though, as what is being put into Wikidata is contents, which appear to me
to be covered under the DbCL, which is like CC0.

Peter F. Patel-Schneider, speaking as an individual


On 04/23/2015 07:20 AM, Serge Wroclawski wrote:
 I am not sure how I missed this discussion, but adding information from
 OSM into Wikidata en mass like this is a violation of the OSM license.
 
 - Serge
 
 On Tue, Mar 10, 2015 at 11:32 AM, Yaroslav M. Blanter pute...@mccme.ru
 wrote:
 On 2015-03-10 14:31, Amir E. Aharoni wrote:
 
 Hi,
 
 [ Aude and Christian Consonni, this should especially interest you.
 ]
 
 I was throwing around ideas with a friend about how OpenStreetMap 
 could be integrated with Wikidata.
 
 ...
 
 Towns obviously have or can a Wikipedia article about them, but 
 probably not every street or shop. But do they fulfill a structural 
 need or is it way too much?
 
 
 Hi Amir,
 
 anything which can be remotely considered as a tourist attraction, as
 well as shops, hotels, reataurants and such are withing the scope of
 Wikivoyage and thus of Wikidata. For streets, we have now an approved
 bot task adding all Dutch streets on Wikidata, and I do not see why any
 other country could be different - provided we have good sources.
 
 Cheers Yaroslav
 
 
 ___ Wikidata-l mailing
 list Wikidata-l@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l
 
 ___ Wikidata-l mailing list 
 Wikidata-l@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJVOQsGAAoJECjN6+QThfjziAYIAJ6uEmHK2HF9bAnTCo97wUf9
o1OPUCUfdaURiRCSTwZKHnxZVPjNS0pRKQMKzZS6+XMNVqy8jq5wCjDQsk7j+4cj
LmRlkZohglxpn9eNqKvpO+5m7gl7Pj+w0eWzb09lE3irMyNN9+C5yBGxUaV+XCAB
AIOqhPmBC5tJARHag0CLC5U0ocKb7C/eDX17f8NyeXFDhy34ejB1xywbtxbjAgxS
eEp+l/KO3CkoKGabPQNDg9Cko/uFQlldPidvgFIHrfaMMdiccwwIMkKq0CLubvlL
CT1J/y8/zd5cMTuiX6S1/HMhIPru+hZMsHlOe2Sx6XRH1MXARn0iX9WXzHqR7eA=
=6EWJ
-END PGP SIGNATURE-

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-tech] Question on definition of Wikidata query service

2015-01-12 Thread Peter F. Patel-Schneider
Suppose there is a property P that is a subproperty of Q and I query for the Q 
properties of some object O that has values for both P and Q.  When I query 
for O's Q property values do I always get the P property values as well?


For example (using triples)
  daughter subpropertyof child .
  son subpropertyof child .
  John son Bill .
  John daughter Mary .
does a query for John's children return Bill and Mary?


peter


On 01/12/2015 10:46 AM, Nikolas Everett wrote:



On Mon, Jan 12, 2015 at 1:35 PM, Peter F. Patel-Schneider
pfpschnei...@gmail.com mailto:pfpschnei...@gmail.com wrote:

Hi:

Is there a definition of what queries against Wikidata are supposed to 
return?


This is somewhat up in the air.  Under the hood we have infinite flexibility
but what gets returned I think varies a lot by the context of the query.
Minimally we'll be returning the entity's id (Q2, etc).


In particular, how will queries interact with P1647 (subproperty of) and
similiar aspect of Wikidata?


We're not sure yet.  What kind of interaction are you looking for?

Nik


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-l] Freebase's incompatible types and Property description permissions

2015-01-08 Thread Peter F. Patel-Schneider

What then is P17 supposed to be used for?

Could, I, for example, use P17 on the address of the Swiss embassy in Germany 
and have Switzerland as the value?


associated is generally too weak a word to use in describing properties.

peter


On 01/08/2015 01:46 PM, Thad Guidry wrote:

Markus,

Devils in the details. =)

You used the English word associated.  That's great.  Then I would propose
to expand the definition of P17 just a bit to add that.

P17 Country - sovereign state of this item ... to ... sovereign state
ASSOCIATED with this item

Then you save the world. =)

Thoughts ? Agreement ?

Secondly, the Description: (Description :colon:  on the Discussion page
https://www.wikidata.org/wiki/Property_talk:P17) is defining a Country... not
the description of the Country __Property__..which is the line just above it.
How is the Description :colon: line supposed to work or be really used for ?
Seems like the Description :colon: line is basically describing the Represents
:colon: line lol.
Very confusing.

Thad
+ThadGuidry https://www.google.com/+ThadGuidry



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Conflict of Interest policy for Wikidata

2015-01-07 Thread Peter F. Patel-Schneider
Wikipedia has already addressed this question.  See 
http://en.wikipedia.org/wiki/Wikipedia:Autobiography.  In summary, one should 
not add or change information about oneself, unless the change could not be 
considered to be non-controversial or there is some reason that a change 
should be made and the reasons for the change are laid out in a talk page. 
This is pretty much just the general conflict of interest guidelines applied 
to information about oneself, I think.


There was an instance of someone writing their own Wikipedia entry.  (I'm not 
linking to information about the issue to somewhat hide the identity of the 
guilty.)  The end of the discussion was that the page would not be taken down. 
 The decision hinged, in part, on how easy it would be to anonymously enter 
or change information about oneself, so forbidding this kind of activity is 
impossible to police.  The best that can be done is to point out that this 
kind of activity is strongly discouraged.


I think that the Wikipedia policy should be carried over directly to Wikidata. 
 It lets responsible individuals fix or point out errors concerning 
information about them, but has strong admonitions against making any other 
kind of changes to this information.


Peter F. Patel-Schneider

On 01/07/2015 06:25 AM, Markus Krötzsch wrote:

Back to Denny's original question:

Does anybody see a specific danger of abuse if living people get to edit their
own data right now? Entering wrong claims deliberately would maybe not be the
biggest issue here (since it is already in conflict with other general
policies -- we do not want wrong data, whoever is entering it -- and the fact
that we want to rely on external sources for all non-obvious data would still
apply). Could it be problematic if somebody enters too much/too detailed data
on their own person? Could somebody use this to place links to external web
content (spam) hidden in personal properties? But this, again, would probably
conflict with other policies too, and it does not seem to be a problem
specific to the particular POVs that a living person may have. Any other ideas
of possible abuse? My main question is: where could POV be an issue when
entering (externally referenced) data of the granularity that we have?

Some proposals of what we could allow/forbid that are specific to our special
form of content:

* Allow living people to edit certain properties on their own page
(whitelist)? I currently don't see any way of really abusing things like
birthdate, given name, etc. that are just personal properties, unless maybe in
rare cases where there is a real dispute (maybe a living person who insists on
being younger than he really is?).

* Alternatively, maybe it could even be enough to have a blacklist of certain
properties that one could be using in illegitimate ways (no specific idea now
what this might be).

* I would also allow people to set their labels and reasonable aliases, but
not have them enter any descriptions (could be POVed).


If living people are asked to not edit all or certain parts of their entity,
then there needs be a process for them to report errors. I would not like
wrong information to be broadcasted about me on Wikidata without having any
way to get it fixed.

In addition, there should be a template that one can use on one's user page to
disclose that one is the person described in a certain item. Conversely, we
should also use our website account on property (P553) to connect living
people to their Wikidata user account, so the COI is recorded in the data. One
could further disclose other COIs on one's user page in some standard format,
but maybe with Wikidata we could actually derive such COIs automatically (your
family members, the companies you founded, the university you graduated from,
etc. can all be specified in data).

Cheers,

Markus


On 04.01.2015 19:57, Andy Mabbett wrote:

Yes. they can. That's stated explicitly:

  A Wikimedia Project community may adopt an alternative paid contribution
  disclosure policy. If a Project adopts an alternative disclosure
policy, you may
  comply with that policy instead of the requirements in this section when
  contributing to that Project.

And Commons, for one, has already done so:


https://commons.wikimedia.org/wiki/Commons:Paid_contribution_disclosure_policy

which says in full:

 The Wikimedia Commons community does not require any disclosure of paid
 contributions from its contributors.


On 4 January 2015 at 07:40, Jasper Deng jas...@jasperswebsite.com wrote:

@Andy: no, the terms of use are the minimum because since a user must
legally accept them when editing a project, everyone is bound by them by
virtue of editing. Local projects cannot override that.

On Sat, Jan 3, 2015 at 11:28 AM, Andy Mabbett a...@pigsonthewing.org.uk
wrote:


On 3 January 2015 at 18:13, Joe Filceolaire filceola...@gmail.com wrote:



The terms of use are the minimum requirements.  Each wiki may have more