Re: [Wikidata] Full-text / autocomplete search on labels

2019-10-04 Thread Ettore RIZZA
Forgot to mention: you can do the same search using the API only:
https://www.wikidata.org/w/api.php?action=wbsearchentities=einst=en=json

But not sure you can easily filter the results by "instance of".

Ettore Rizza


On Fri, 4 Oct 2019 at 10:15, Ettore RIZZA  wrote:

> Hello Thomas,
>
> You can perform a full text search with the API, but not yet with SPARQL
> AFAIK. However, it is possible to call the API in a SPARQL query. For
> example, here is a query
> <https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fperson%20%3FpersonLabel%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22EntitySearch%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22www.wikidata.org%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Asearch%20%22einst%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Alanguage%20%22en%22%20.%0A%20%20%20%20%20%20%3Fperson%20wikibase%3AapiOutputItem%20mwapi%3Aitem%20.%0A%20%20%7D%0A%20%20%3Fperson%20wdt%3AP31%20wd%3AQ5.%0A%20%20%0A%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%20%20%0A%20%20%0A%7D%20ORDER%20BY%20DESC(%3Fperson)%20LIMIT%2020>
> that looks for "human (Q5)"  whose label contains the string "einst".
>
> Hope this helps,
>
> Ettore Rizza
>
>
> On Fri, 4 Oct 2019 at 09:58, Thomas Francart 
> wrote:
>
>> Hello
>>
>> I understand the wikidata SPARQL label service only fetches the labels,
>> but does not allow to search/filter on them; labels are also available in
>> regulare rdfs:label on which a FILTER can be made.
>> However I would like to do full-text search over labels, to e.g. feed an
>> autocomplete search field, actually just like the usual top-right wikidata
>> search field does. I would also be interested to combine this with a
>> criteria on "instance of", to search only on instances of a given class.
>>
>> Can I do that efficiently using the Wikidata SPARQL service ? or is there
>> a separate API I could use ? (exemple welcome)
>>
>> Thanks
>> Thomas
>>
>> --
>>
>> *Thomas Francart* -* SPARNA*
>> Web de *données* | Architecture de l'*information* | Accès aux
>> *connaissances*
>> blog : blog.sparna.fr, site : sparna.fr, linkedin :
>> fr.linkedin.com/in/thomasfrancart
>> tel :  +33 (0)6.71.11.25.97, skype : francartthomas
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Full-text / autocomplete search on labels

2019-10-04 Thread Ettore RIZZA
Hello Thomas,

You can perform a full text search with the API, but not yet with SPARQL
AFAIK. However, it is possible to call the API in a SPARQL query. For
example, here is a query
<https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fperson%20%3FpersonLabel%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22EntitySearch%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22www.wikidata.org%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Asearch%20%22einst%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Alanguage%20%22en%22%20.%0A%20%20%20%20%20%20%3Fperson%20wikibase%3AapiOutputItem%20mwapi%3Aitem%20.%0A%20%20%7D%0A%20%20%3Fperson%20wdt%3AP31%20wd%3AQ5.%0A%20%20%0A%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%20%20%0A%20%20%0A%7D%20ORDER%20BY%20DESC(%3Fperson)%20LIMIT%2020>
that looks for "human (Q5)"  whose label contains the string "einst".

Hope this helps,

Ettore Rizza


On Fri, 4 Oct 2019 at 09:58, Thomas Francart 
wrote:

> Hello
>
> I understand the wikidata SPARQL label service only fetches the labels,
> but does not allow to search/filter on them; labels are also available in
> regulare rdfs:label on which a FILTER can be made.
> However I would like to do full-text search over labels, to e.g. feed an
> autocomplete search field, actually just like the usual top-right wikidata
> search field does. I would also be interested to combine this with a
> criteria on "instance of", to search only on instances of a given class.
>
> Can I do that efficiently using the Wikidata SPARQL service ? or is there
> a separate API I could use ? (exemple welcome)
>
> Thanks
> Thomas
>
> --
>
> *Thomas Francart* -* SPARNA*
> Web de *données* | Architecture de l'*information* | Accès aux
> *connaissances*
> blog : blog.sparna.fr, site : sparna.fr, linkedin :
> fr.linkedin.com/in/thomasfrancart
> tel :  +33 (0)6.71.11.25.97, skype : francartthomas
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata geo-coordinates

2019-09-02 Thread Ettore RIZZA
Is there an elegant way to get data out of wikidata in a format that you
can then fill back into another Wikibase without the pain of such
conversions (like splitting coordinates, changing columns, changing the
prefixes...)

It would probably be easier for you to get longitude and latitude
separately, but if I understand correctly the SPARQL query is not the most
straightforward: https://w.wiki/7ny

Cheers,

Ettore Rizza


On Mon, 2 Sep 2019 at 21:34, Olaf Simons 
wrote:

> Cool,
>
> that is a very useful link for us to keep an eye on!
>
> Thanks,
> Olaf
>
>
> > Jan Macura  hat am 2. September 2019 um 21:22
> geschrieben:
> >
> >
> > On Mon, 2 Sep 2019 at 21:11, Olaf Simons  >
> > wrote:
> >
> > > This might be then the central thing to have in the next QueryService
> > > Version: In addition to the present download formats an option to get
> a csv
> > > or tsv output that can be put into QuickStatements for another Wikibase
> > > without tedious conversions.
> > >
> >
> > Well, this is probably more on the QuickStatements site, to allow input
> in
> > a form of WKT...
> >
> >
> > > We will have more and more people with Wikibase installations who will
> use
> > > Wikidata (or other Wikibases) as data source for their platforms.
> > >
> >
> >  Of course. And this is what is currently (yet a bit confusingly) called
> > Federation: https://www.wikidata.org/wiki/Wikidata:Federation_input
> >
> > Best regards,
> >  Jan
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> Dr. Olaf Simons
> Forschungszentrum Gotha der Universität Erfurt
> Schloss Friedenstein, Pagenhaus
> 99867 Gotha
>
> Büro: +49-361-737-1722
> Mobil: +49-179-5196880
>
> Privat: Hauptmarkt 17b/ 99867 Gotha
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

2019-08-28 Thread Ettore RIZZA
@Uwe : I'm sorry if I say trivialities, but are you familiar with the
Recoin tool [1] ? It seems to be quite close to what you describe, but only
for the data quality dimension of completeness (or more precisely *relative*
completeness) and it could perhaps serve as a model for what you are
considering. It is also a good example of a data quality tool that is
directly useful to editors, as it often allows them to identify and add
missing statements on an item.

Regards,

Ettore Rizza

[1] https://www.wikidata.org/wiki/Wikidata:Recoin



On Tue, 27 Aug 2019 at 21:49, Uwe Jung  wrote:

> Hello,
>
> many thanks for the answers to my contribution from 24.8.
> I think that all four opinions contain important things to consider.
>
> @David Abián
> I have read the article and agree that in the end the users decide which
> data is good for them or not.
>
> @GerardM
> It is true that in a possible implementation of the idea, the aspect of
> computing load must be taken into account right from the beginning.
>
> Please check that I have not given up on the idea yet. With regard to the
> acceptance of Wikidata, I consider a quality indicator of some kind to be
> absolutely necessary. There will be a lot of ordinary users who would like
> to see something like this.
>
> At the same time I completely agree with David;(almost) every chosen
> indicator is subject to a certain arbitrariness in the selection. There
> won't be one easy to understand super-indicator.
> So, let's approach things from the other side. Instead of a global
> indicator, a separate indicator should be developed for each quality
> dimension to be considered. With some dimensions this should be relatively
> easy. For others it could take years until we have agreed on an algorithm
> for their calculation.
>
> Furthermore, the indicators should not represent discrete values but a
> continuum of values. No traffic light statements (i.e.: good, medium, bad)
> should be made. Rather, when displaying the qualifiers, the value could be
> related to the values of all other objects (e.g. the value x for the
> current data object in relation to the overall average for all objects for
> this indicator). The advantage here is that the total average can increase
> over time, meaning that the position of the value for an individual object
> can also decrease over time.
>
> Another advantage: Users can define the required quality level themselves.
> If, for example, you have high demands on accuracy but few demands on the
> completeness of the statements, you can do this.
>
> However, it remains important that these indicators (i.e. the evaluation
> of the individual item) must be stored together with the item and can be
> queried together with the data using SPARQL.
>
> Greetings
>
> Uwe Jung
>
> Am Sa., 24. Aug. 2019 um 13:54 Uhr schrieb Uwe Jung :
>
>> Hello,
>>
>> As the importance of Wikidata increases, so do the demands on the quality
>> of the data. I would like to put the following proposal up for discussion.
>>
>> Two basic ideas:
>>
>>1. Each Wikidata page (item) is scored after each editing. This score
>>should express different dimensions of data quality in a quickly 
>> manageable
>>way.
>>2. A property is created via which the item refers to the score
>>value. Certain qualifiers can be used for a more detailed description 
>> (e.g.
>>time of calculation, algorithm used to calculate the score value, etc.).
>>
>>
>> The score value can be calculated either within Wikibase after each data
>> change or "externally" by a bot. For the calculation can be used among
>> other things: Number of constraints, completeness of references, degree of
>> completeness in relation to the underlying ontology, etc. There are already
>> some interesting discussions on the question of data quality which can be
>> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
>> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>>
>> Advantages
>>
>>- Users get a quick overview of the quality of a page (item).
>>- SPARQL can be used to query only those items that meet a certain
>>quality level.
>>- The idea would probably be relatively easy to implement.
>>
>>
>> Disadvantage:
>>
>>- In a way, the data model is abused by generating statements that no
>>longer describe the item itself, but make statements about the
>>representation of this item in Wikidata.
>>- Additional computing power must be provided for the regular
>>calculation of all changed items.
>>- Only the quality of pag

Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata

2019-08-24 Thread Ettore RIZZA
Hello,

Very interesting idea. Just to feed the discussion, here is a very recent
literature survey on data quality in Wikidata:
https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf
https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf

Cheers,

Ettore Rizza



On Sat, 24 Aug 2019 at 13:55, Uwe Jung  wrote:

> Hello,
>
> As the importance of Wikidata increases, so do the demands on the quality
> of the data. I would like to put the following proposal up for discussion.
>
> Two basic ideas:
>
>1. Each Wikidata page (item) is scored after each editing. This score
>should express different dimensions of data quality in a quickly manageable
>way.
>2. A property is created via which the item refers to the score value.
>Certain qualifiers can be used for a more detailed description (e.g. time
>of calculation, algorithm used to calculate the score value, etc.).
>
>
> The score value can be calculated either within Wikibase after each data
> change or "externally" by a bot. For the calculation can be used among
> other things: Number of constraints, completeness of references, degree of
> completeness in relation to the underlying ontology, etc. There are already
> some interesting discussions on the question of data quality which can be
> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>
> Advantages
>
>- Users get a quick overview of the quality of a page (item).
>- SPARQL can be used to query only those items that meet a certain
>quality level.
>- The idea would probably be relatively easy to implement.
>
>
> Disadvantage:
>
>- In a way, the data model is abused by generating statements that no
>longer describe the item itself, but make statements about the
>representation of this item in Wikidata.
>- Additional computing power must be provided for the regular
>calculation of all changed items.
>- Only the quality of pages is referred to. If it is insufficient, the
>changes still have to be made manually.
>
>
> I would now be interested in the following:
>
>1. Is this idea suitable to effectively help solve existing quality
>problems?
>2. Which quality dimensions should the score value represent?
>3. Which quality dimension can be calculated with reasonable effort?
>4. How to calculate and represent them?
>5. Which is the most suitable way to further discuss and implement
>this idea?
>
>
> Many thanks in advance.
>
> Uwe Jung  (UJung <https://www.wikidata.org/wiki/User:UJung>)
> www.archivfuehrer-kolonialzeit.de/thesaurus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] soweego: link Wikidata to large catalogs

2019-07-10 Thread Ettore RIZZA
Hello,

It's really great, I can't wait to read more about this system (I am very
interested in record linkage and entity linking) and especially to see it
in action!

Concerning the name, I prefer tools with an explicit and self-describing
name (QuickStatement, WikiFactMine...) rather than smart acronyms like SQID
or Soweego. It looks like there are hundreds of them in the Wikidata
ecosystem and we are no longer sure which one is doing what.

It was just to give my 2 cents.

Congratulations anyway!

Cheers,

Ettore Rizza


On Wed, 10 Jul 2019 at 19:40, Marco Fossati  wrote:

> Dear all,
>
> ---
> TL;DR: soweego version 1 will be released soon. In the meanwhile, why
> don't you consider endorsing the next steps?
> https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Hjfocs/soweego_1.1
> ---
>
> This is a pre-release notification for early feedback.
>
> Does the name *soweego* ring you a bell?
> It is a machine learning-based pipeline that links Wikidata to large
> catalogs [1].
> It is a close friend of Mix'n'match [2], which mainly caters for small
> catalogs.
>
> The first version is almost done, and will start uploading results soon.
> Confident links are going to feed Wikidata via a bot [3], while others
> will get into Mix'n'match for curation.
>
> The next short-term steps are detailed in a rapid grant proposal [4],
> and I would be really grateful if you could consider an endorsement there.
>
> The soweego team has also tried its best to address the following
> community requests:
> 1. plan a sync mechanism between Wikidata and large catalogs / implement
> checks against external catalogs to find mismatches in Wikidata;
> 2. enable users to add links to new catalogs in a reasonable time.
>
> So, here is the most valuable contribution you can give to the project
> right now: understand how to *import a new catalog* [5].
>
> Can't wait for your reactions.
> Cheers,
>
> Marco
>
> [1] https://soweego.readthedocs.io/
> [2] https://tools.wmflabs.org/mix-n-match/
> [3] see past contributions:
>
> https://www.wikidata.org/w/index.php?title=Special:Contributions/Soweego_bot=20190401194034=Soweego+bot
> [4]
> https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Hjfocs/soweego_1.1
> [5] https://soweego.readthedocs.io/en/latest/new_catalog.html
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] What's wrong in Italy ?

2019-05-11 Thread Ettore RIZZA
>
> we don't put "former politician" as occupation just because they're not
> politician anymore


@Nicolas : I agree with your opinion (which does not seem so strong to me)
and, to continue with your example, I think that being able to mentally add
"former" to an existential property (P31 or P279) should be a red flag. I
mean, you can feel the difference in nature between the statements "Tim
Berners-Lee is a human" or "Lassie is a dog" and "Foufny-Les-Bains-De-Pied
is an administrative entity".  The latter (like country, by the way) is
closer to a contiguous and accidental property as a profession can be.

Ettore Rizza


Le sam. 11 mai 2019 à 19:06, Nicolas VIGNERON 
a écrit :

> Hi:
>
> Strong opinion here: I never understood, why having item "former X" in the
> first place? This seems to be an inelegant, heavy and lazzy way to model
> things. I feel that the whole "former X" should be burn.
> Why not just not use "X" with an end date? For me, it seems to be a way
> better, lighter and more precise structure. Plus, most of the time, it's
> closer to what the sources say.
> We use that method quite often (for French communes for instance - I
> remove all the "former French commune" whenever I saw them -, but also for
> all humans, we never put "former human" just because they are dead, and we
> don't put "former politician" as occupation just because they're not
> politician anymore) and it works perfectly.
> Finally, if we go down the "former X" way, 'in fine', it would double the
> size of Wikidata (as anything and everything can become a former something)
> for no good reason (at least I don't see any, especially as the growth of
> Wikidata seems to be a concern for some).
>
> Cheers, ~nicolas
>
> Le sam. 11 mai 2019 à 18:41, Ettore RIZZA  a
> écrit :
>
>> Hello,
>>
>> It seems legit that a "former" something is no longer something and
>> shouldn't be included among the instances of "something". But it doesn't
>> sound like orthodox ontological modeling. I guess it's a workaround because
>> we can't add a qualifier (e. g. time validity) to P31 or P279 properties
>> (why by the way?)
>>
>> Cheers,
>>
>> Ettore Rizza
>>
>>
>> Le sam. 11 mai 2019 à 18:12, Gerard Meijssen 
>> a écrit :
>>
>>> Hoi,
>>> Does this mean that any former country are no longer considered as a
>>> country ? What is then meant by a "former administrative territorial
>>> entity"? I am afraid that without readily available clear definitions, this
>>> is just an academic exercise.
>>>
>>> In order to help understand it, what are the use cases? How will maps of
>>> such entities be shown, how will this relate to maps of "countries" that
>>> are still in existence and may have co-existed? How do you deal with maps
>>> of countries that no longer represent facts on the ground??
>>> Thanks,
>>>   GerardM
>>>
>>> On Sat, 11 May 2019 at 11:13, Fabrizio Carrai 
>>> wrote:
>>>
>>>> Indeed there are many things to review, the Thomas' link
>>>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q6256=279> is
>>>> very useful:
>>>>
>>>> 1) The classes of the geopolitical divisions (btw, it was the original
>>>> subject of this thread) rooted by "country"|(Q6256
>>>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q6256=279=list>)
>>>> have to be separated by the classification of the system of government
>>>> "state" (Q7275
>>>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q7275=279=list>),
>>>> as noted below.
>>>> 2) An historical country will have not be part of the "country" class,
>>>> eventually moved under the "former administrative territorial entity"
>>>> Q19953632 class (to be reviewed as well)
>>>>
>>>> The first concept matches an OpenStreetMap concept to map the reality,
>>>> a picture of the present.
>>>>
>>>> Fabrizio
>>>>
>>>> Il giorno mer 8 mag 2019 alle ore 11:15 Thomas Douillard <
>>>> thomas.douill...@gmail.com> ha scritto:
>>>>
>>>>> > I'm a bit puzzled by a ranking with at preference in an "instance
>>>>> of" but...
>>>>>
>>>>> It’s indeed an interesting point. The problem in the country domain is
>>>>>

Re: [Wikidata] What's wrong in Italy ?

2019-05-11 Thread Ettore RIZZA
Hello,

It seems legit that a "former" something is no longer something and
shouldn't be included among the instances of "something". But it doesn't
sound like orthodox ontological modeling. I guess it's a workaround because
we can't add a qualifier (e. g. time validity) to P31 or P279 properties
(why by the way?)

Cheers,

Ettore Rizza


Le sam. 11 mai 2019 à 18:12, Gerard Meijssen  a
écrit :

> Hoi,
> Does this mean that any former country are no longer considered as a
> country ? What is then meant by a "former administrative territorial
> entity"? I am afraid that without readily available clear definitions, this
> is just an academic exercise.
>
> In order to help understand it, what are the use cases? How will maps of
> such entities be shown, how will this relate to maps of "countries" that
> are still in existence and may have co-existed? How do you deal with maps
> of countries that no longer represent facts on the ground??
> Thanks,
>   GerardM
>
> On Sat, 11 May 2019 at 11:13, Fabrizio Carrai 
> wrote:
>
>> Indeed there are many things to review, the Thomas' link
>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q6256=279> is
>> very useful:
>>
>> 1) The classes of the geopolitical divisions (btw, it was the original
>> subject of this thread) rooted by "country"|(Q6256
>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q6256=279=list>)
>> have to be separated by the classification of the system of government
>> "state" (Q7275
>> <https://tools.wmflabs.org/wikidata-todo/tree.html?q=Q7275=279=list>),
>> as noted below.
>> 2) An historical country will have not be part of the "country" class,
>> eventually moved under the "former administrative territorial entity"
>> Q19953632 class (to be reviewed as well)
>>
>> The first concept matches an OpenStreetMap concept to map the reality, a
>> picture of the present.
>>
>> Fabrizio
>>
>> Il giorno mer 8 mag 2019 alle ore 11:15 Thomas Douillard <
>> thomas.douill...@gmail.com> ha scritto:
>>
>>> > I'm a bit puzzled by a ranking with at preference in an "instance of"
>>> but...
>>>
>>> It’s indeed an interesting point. The problem in the country domain is
>>> that there is a lot of evolution into the regime of a state, a state can be
>>> sovereign at some point in history then become a part of a bigger state
>>> losing its sovereignty. If we assume a kind of continuity of a state across
>>> this status change, we have to use ranks to select the last valid value.
>>> There may not be items for all the regime of a state in history, and a
>>> practical choice could be to store the information in the « instance of »
>>> statements with date qualifiers.
>>>
>>> In this case I assume however it’s just a practical way to circumvene
>>> the complexity or even inexistence of our ontology on countries. I would in
>>> most case would have noted that « sovereign state » is a subclass of «
>>> country ». If you want to include countries of all kind and don’t miss any,
>>> you’d have to use a construction like
>>> > ?country wdt:P31/wdt:P279* wd:Q6256.
>>>
>>> The problem with this is that there is many subclasses of « wd:Q6256 »
>>> (country) :
>>> https://tools.wmflabs.org/wikidata-todo/tree.html?lang=fr=Q6256=279
>>> so this might include some unwanted « countries ». It would be interesting
>>> to check what the differences are, to see which one is best or if some
>>> subclasses of country should not be.
>>>
>>> There for example a subclass of « wd:Q6256 » that is « former countries
>>> », so this query would include former entities.
>>>
>>> My opinion on these is that if we choose a scheme where there are
>>> classes or former entities, the best would be to have the counterpart «
>>> today’s country » (with label (country) and a superclass for both, «
>>> country (either former or not ) ») to avoid having the « former country »
>>> class be a subclass of «today’s country».
>>>
>>> Another question, why is not « sovereign state » as the sole class not
>>> enough for this query ? Or only the state recognized by the United Nations
>>> (I don’t know if/how we model this) ?
>>>
>>>
>>>
>>> Le mar. 7 mai 2019 à 23:51, Fabrizio Carrai 
>>> a écrit :
>>>
>>>> Thank you Nicolas!
>>>> I found the same situation for other countries like France (Q

Re: [Wikidata] Desarrollo de Sistema de Recomendación para Wikidata

2019-04-07 Thread Ettore RIZZA
Hello Stalin,

I see more or less what a recommendation system based on Wikidata would
look like, but could you elaborate on what you mean by "recommendation
system *for* Wikidata users"?

Cheers,
Ettore Rizza


Le dim. 7 avr. 2019 à 13:14, Stalin Figueroa Álava <
stalinfiguero...@gmail.com> a écrit :

> Gracias Ghislain, pero lo mas pronto posible necesito definir un tema
> especifico para empezar a desarrollar un sistema de recomendación para
> usuarios de wikidata.
>
>
>
> El dom., 7 abr. 2019 a las 4:25, Ghislain ATEMEZING (<
> ghislain.atemez...@gmail.com>) escribió:
>
>> Hola Stalin,
>> I would suggest that you have a look at this Summer School [1] very
>> closed to Guayaquil, where you will interact personally with some gurus of
>> the SemWeb field.
>>
>> Of course if you can give more details  on how this community can help,
>> don’t hesitate.
>>
>> Saludos,
>>
>> Ghislain
>>
>> [1] http://www.kgswc.org/summer-school/
>>
>> Sent from a mobile device, please excuse any brevity or typing errors
>>
>>
>> Le 7 avr. 2019 à 00:57, Stalin Figueroa Álava 
>> a écrit :
>>
>> Hola Grupo, reciban un cordial saludo de Stalin Figueroa Alava.
>> Soy estudiante de Doctorado en Ingeniería de la Información y quiero
>> hacer como tema de tesis un Sistema de recomendación aplicado a wikidata.
>> Por favor me pueden sugerir un tema.
>> Se los agradeceré mucho.
>>
>> --
>>
>>
>> *Atte.MSc. Stalin Figueroa Álava. *
>> Docente de Ingeniería en Sistemas de Información.
>> *   UNIVERSIDAD DE GUAYAQUIL*
>>   www.stalinfigueroa.com
>> <https://stalinfigueroa6.wixsite.com/stalfig>
>> 593-9-6946-7455
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
>
>
> *Atte.MSc. Stalin Figueroa Álava. *
> Docente de Ingeniería en Sistemas de Información.
> *   UNIVERSIDAD DE GUAYAQUIL*
>   www.stalinfigueroa.com
> <https://stalinfigueroa6.wixsite.com/stalfig>
> 593-9-6946-7455
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL - value of property

2019-03-28 Thread Ettore RIZZA
Hello,

Of course, this is a pretty simple query: http://tinyurl.com/y5jvash3
Hope this helps.

Ettore Rizza


Le jeu. 28 mars 2019 à 09:51, Raveesh M  a écrit :

> Hi,
>
> I would like to know if SPARQL can be used to fetch information like "What
> is the value of the property P31 for a given QId?"
>
> e.g.,
> query: What is the value of the property P31 for QId="Q949228"
> response: Q11424 (label: film)
>
> The intent behind such query is to identify the "type of" (or "sub-class
> of") for some entity Q for which I know the QId.
>
> Thanks and regards,
> Raveesh
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Article Size Filter on Wikidata

2019-01-12 Thread Ettore RIZZA
Wow, thank you! It would take me a whole month to write such a query. :-|

Ettore Rizza


Le sam. 12 janv. 2019 à 15:42, Lucas Werkmeister 
a écrit :

> Hm, good point. It is, in theory, possible, I think – this query
> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FtitleEn%0AWITH%20%7B%0A%20%20SELECT%20%3Fitem%20WHERE%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP31%20wd%3AQ5%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP106%20wd%3AQ36180%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP21%20wd%3AQ6581097%3B%0A%20%20%20%20%20%20%20%20%20%20wikibase%3Asitelinks%20%3Fsitelinks.%0A%20%20%7D%0A%20%20%23%20ORDER%20BY%20DESC%28%3Fsitelinks%29%0A%20%20LIMIT%2050%0A%7D%20AS%20%25maleAuthors%0AWHERE%20%7B%0A%20%20INCLUDE%20%25maleAuthors.%0A%20%20hint%3ASubQuery%20hint%3Aoptimizer%20%22None%22.%0A%20%20%3Farticle%20schema%3Aabout%20%3Fitem%3B%0A%20%20%20%20%20%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E%3B%0A%20%20%20%20%20%20%20%20%20%20%20schema%3Aname%20%3FtitleEn.%0A%20%20BIND%28STR%28%3FtitleEn%29%20AS%20%3Ftitle%29%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22allpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agapfrom%20%3Ftitle%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agapminsize%20%221%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agaplimit%20%221%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Alimit%201%20.%0A%20%20%20%20%3Fitem_%20wikibase%3AapiOutputItem%20mwapi%3Aitem.%0A%20%20%7D%0A%20%20FILTER%28%3Fitem%20%3D%20%3Fitem_%29%0A%7D%0ALIMIT%2050>
> abuses the allpages generator as a generator for exactly one page:
>
> SELECT ?item ?titleEn
> WITH {
>   SELECT ?item WHERE {
> ?item wdt:P31 wd:Q5;
>   wdt:P106 wd:Q36180;
>   wdt:P21 wd:Q6581097;
>   wikibase:sitelinks ?sitelinks.
>   }
>   # ORDER BY DESC(?sitelinks)
>   LIMIT 50
> } AS %maleAuthors
> WHERE {
>   INCLUDE %maleAuthors.
>   hint:SubQuery hint:optimizer "None".
>   ?article schema:about ?item;
>schema:isPartOf <https://en.wikipedia.org/>
> <https://en.wikipedia.org/>;
>schema:name ?titleEn.
>   BIND(STR(?titleEn) AS ?title)
>   SERVICE wikibase:mwapi {
> bd:serviceParam wikibase:api "Generator";
> wikibase:endpoint "en.wikipedia.org";
> mwapi:generator "allpages";
> mwapi:gapfrom ?title;
> mwapi:gapminsize "1";
> mwapi:gaplimit "1";
> wikibase:limit 1 .
> ?item_ wikibase:apiOutputItem mwapi:item.
>   }
>   FILTER(?item = ?item_)
> }
> LIMIT 50
>
> Conveniently, it has a minimum size parameter built in, so we don’t even
> need to get the size as a revision property and filter on it afterwards.
>
> However, this requires one API call per item, so it doesn’t scale at all –
> this query with just 50 arbitrary author items already takes about half a
> minute. (The commented-out ORDER BY DESC(?sitelinks) is intended as a
> heuristic to find larger articles first, but all the top 50 authors by
> sitelinks have articles longer than 1 bytes on enwiki, so in that case
> you might as well just skip the MWAPI part altogether.)
>
> I don’t think this can work very well. Even if MWAPI was expanded so that
> we could directly feed 50 or even 500 titles to the query API (as the
> titles parameter, skipping generators altogether), that’s probably still
> too much of a bottleneck for this kind of query.
> On 12.01.19 15:00, Ettore RIZZA wrote:
>
> Hi,
>
> Since the Mediawiki API allows to get the size in bytes of the last
> revision
> <https://en.wikipedia.org/w/api.php?action=query=json=barack%20obama=revisions=size>
> of a Wikipedia page, is it not possible to retrieve this information with a
> generator? (it's a real question, I'm not at all comfortable with this
> API).
>
> Ettore Rizza
>
>
> Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif  a
> écrit :
>
>> Right, I see what you mean. Thanks a lot!
>>
>> On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister 
>> wrote:
>>
>>> Well, if you take just the MWAPI part of the query
>>> <https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%2

Re: [Wikidata] Article Size Filter on Wikidata

2019-01-12 Thread Ettore RIZZA
Hi,

Since the Mediawiki API allows to get the size in bytes of the last revision
<https://en.wikipedia.org/w/api.php?action=query=json=barack%20obama=revisions=size>
of a Wikipedia page, is it not possible to retrieve this information with a
generator? (it's a real question, I'm not at all comfortable with this
API).

Ettore Rizza


Le sam. 12 janv. 2019 à 14:41, Reem Al-Kashif  a
écrit :

> Right, I see what you mean. Thanks a lot!
>
> On Sat, 12 Jan 2019 at 15:35, Lucas Werkmeister 
> wrote:
>
>> Well, if you take just the MWAPI part of the query
>> <https://query.wikidata.org/#SELECT%20%3Ftitle%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wikibase%3Aapi%20%22Generator%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agenerator%20%22querypage%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqppage%20%22Longpages%22%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20mwapi%3Agqplimit%20%22max%22.%0A%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle.%0A%20%20%7D%0A%7D>,
>> you’ll get exactly 1 results, but most of them aren’t male authors (a
>> lot of them seem to be lists of various kinds). And I think those 1
>> results are all we can get from the API, so if we limit those to male
>> authors afterwards, we only get a few results (about 100), and there’s no
>> way to increase that number as far as I’m aware, because apparently we
>> can’t get more than 1 total pages from MWAPI.
>>
>> Cheers,
>> Lucas
>> On 12.01.19 13:57, Reem Al-Kashif wrote:
>>
>> Thank you so much, Nicolas & Lucas!
>>
>> @Lucas this helps a lot! At least I will get an idea about what I need
>> until PetScan is sorted out. Would you elaborate a bit more what do you
>> mean by "most of its results are linked to items we don’t care about"?
>>
>> Best,
>> Reem
>>
>> On Sat, 12 Jan 2019 at 14:18, Lucas Werkmeister 
>> wrote:
>>
>>> You can’t directly query for the size as far as I know, but you can use
>>> the longpages query page generator to get a list of the longest enwiki
>>> pages, then filter the associated items for male authors. But this will
>>> only get you about a hundred results until the longpages list is exhausted
>>> (most of its results are linked to items we don’t care about), and it won’t
>>> get you the actual size (and therefore the order of results isn’t
>>> necessarily meaningful either, you just know they’re among the longest
>>> pages).
>>>
>>> SELECT ?item ?titleEn WHERE {
>>>   hint:Query hint:optimizer "None".
>>>   SERVICE wikibase:mwapi {
>>> bd:serviceParam wikibase:endpoint "en.wikipedia.org";
>>> wikibase:api "Generator";
>>> mwapi:generator "querypage";
>>> mwapi:gqppage "Longpages";
>>> mwapi:gqplimit "max".
>>> ?title wikibase:apiOutput mwapi:title.
>>>   }
>>>   BIND(STRLANG(?title, "en") AS ?titleEn)
>>>   ?sitelink schema:name ?titleEn;
>>> schema:isPartOf <https://en.wikipedia.org/>
>>> <https://en.wikipedia.org/>;
>>> schema:about ?item.
>>>   ?item wdt:P31 wd:Q5;
>>> wdt:P106 wd:Q36180;
>>> wdt:P21 wd:Q6581097.
>>> }
>>>
>>> Try it!
>>>
>>> Cheers, Lucas
>>> On 12.01.19 12:56, Nicolas VIGNERON wrote:
>>>
>>> Hi Reem,
>>>
>>> If this page
>>> https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
>>> is up-o-date it's does not seem possible to get the article size of a
>>> wikipedia article (but I must I don't use and know "wikibase:mwapi" a
>>> lot, maybe it has or will changed).
>>>
>>> Cheers,
>>> Nicolas
>>>
>>> Le sam. 12 janv. 2019 à 12:16, Reem Al-Kashif 
>>> a écrit :
>>>
>>>> Hello!
>>>>
>>>> Hope this finds you well. I put together a query
>>>> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FsitelinkEn%0A%0AWHERE%20%7B%0A%20%3Fitem%20wdt%3AP31%20wd%3AQ5.%0A%20%3Fitem%20wdt%3AP106%20wd%3AQ36180.%0A%20%3Fitem%20wdt%3AP21%20wd%3AQ6581097.%0A%20%3FsitelinkEn%20schema%3Aabout%20%3Fitem%3B%0A%20%20%09%09%09%20%20%20%20schema%3AisPartOf%20%3

Re: [Wikidata] Query on scholarly article fails

2018-12-14 Thread Ettore RIZZA
Hello Fabrizio,

It seems that the problem comes from SERVICE wikibase:label. As said in
another discussion, the query executes in less than one second if you rewrite
it in this way
<https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010>
.

Cheers,

Ettore Rizza


Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai  a
écrit :

> Hello all,
> the following query ends with a timeot:
>
> SELECT ?istanza_di ?istanza_diLabel WHERE {
>   SERVICE wikibase:label { bd:serviceParam wikibase:language
> "[AUTO_LANGUAGE],en". }
>   ?istanza_di wdt:P31 wd:Q13442814.
> }
> LIMIT 10
>
> Can anybody explain why ?
> Thanks in advance
>
> --
> *Fabrizio*
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Ettore RIZZA
Hi,

I see no reason that this should not be done for other groups of living
> organisms where subclass relationships are missing.


It seems very simple to me. Maybe too simple. Perhaps I am intimidated by
the kilometers of discussions I'm reading about the taxon-centric aspect of
Wikidata, when I'm not a biologist. So, there is no problem if we add that
Cetacea  <https://www.wikidata.org/wiki/Q160>is a subclass of aquatic
mammals <https://www.wikidata.org/wiki/Q3039055>, as indicated by its Wikipedia
page <https://en.wikipedia.org/wiki/Cetacea>?

Cheers,

Ettore

On Sat, 20 Oct 2018 at 19:20, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> On 10/20/18 6:29 AM, Ettore RIZZA wrote:
> > For most people, ants are insects, not instances of taxon.
>
> Sure, but Wikidata doesn't have ants being instances of taxon.  Instead,
> Formicidae (aka ant) is an instance of taxon, which seems right to me.
>
> Here are some extracts from Wikidata as of a few minutes ago, also showing
> the English Wikipedia page for the Wikidata item.
>
> https://www.wikidata.org/wiki/Q7386 Formicidae  ant
> https://en.wikipedia.org/wiki/Ant
> instance of taxon
> no subclass of statement
>
> https://www.wikidata.org/wiki/Q1390 insect
> https://en.wikipedia.org/wiki/Insect
> subclass of animal
> instance of taxon
>
> What is missing is that Q7386 is a subclass of Q1390, which is sanctioned
> by
> the "Ants are eusocial insects" phrase at the start of
> https://en.wikipedia.org/wiki/Ant.  I added that statement and put as
> source
> English Wikipedia.  (By the way, how can I source a statement to a
> particular
> Wikipedia page?)
>
>
> I see no reason that this should not be done for other groups of living
> organisms where subclass relationships are missing.
>
> peter
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Ettore RIZZA
Hello,

It is interesting to note that what Cparle wants are "is a" relationships
based on common sense. For most people, ants are insects, not instances of
taxon. A clarinet is a woodwind instrument, and woodwind instruments are
musical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the first
sentence of a Wikipedia page. Whether in English, French, Italian, a woman
is always "a female *human *being."

For "poodle", this would look like (following the links in the English
version of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of the
class of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...


>From my point of view, this classification looks much better than the
current relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts
(especially Wikipedia) has been studied for a long time and gives good
results, even with simple methods based on hand-crafted rules. In the case
of Wikipedia, the hypernym often has a page itself (and therefore a link to
Wikidata), which could simplify the NLP extraction and the mapping with
Wikidata items.

Of course, the extracted relationships will not always be "subclass of" or
"instance of". But if someone proposed a new property called "Wikipedia
Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use
it more willingly and with more confidence than the current system. This
would also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an
"Ancient Iranian *religion *founded by Zoroaster", one would expect the
class "religion" to appear much earlier in the hierarchy of superclasses of
this item. If there was this property "Wikipedia Hypernyms", we could
mention it in the same page - since Wikipedia describes Zoroastrianism as
"one of the world's oldest *religions *that remains active." And a SPARQL
query looking for 'all items that have "religion" as "Wikipedia hypernyms"
property' would be much much faster.

Note: sorry if this reflection is naive or if it has already been
discussed/tested.

Cheers,

Ettore

On Thu, 27 Sep 2018 at 23:35, James Heald  wrote:

> This recent announcement by the Structured Data team perhaps ought to be
> quite a heads-up for us:
>
>
> https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage
>
> Essentially the team has given up on the hope of using Wikidata
> hierarchies to suggest generalised "depicts" values to store for images
> on Commons, to match against terms in incoming search requests.
>
> i.e.  if an image is of a German Shepherd dog, and identified as such,
> the team has given up on trying to infer in general from Wikidata that
> 'dog' is also a search term that such an image should score positively
> with.
>
> Apparently the Wikidata hierarchies were simply too complicated, too
> unpredictable, and too arbitrary and inconsistent in their design across
> different subject areas to be readily assimilated (before one even
> starts on the density of bugs and glitches that then undermine them).
>
> Instead, if that image ought to be considered in a search for 'dog', it
> looks as though an explicit 'depicts:dog' statement may be going to be
> needed to be specifically present, in addition to 'depicts:German
> Shepherd'.
>
> Some of the background behind this assessment can be read in
> https://phabricator.wikimedia.org/T199119
> in particular the first substantive comment on that ticket, by Cparle on
> 10 July, giving his quick initial read of some of the issues using
> Wikidata would face.
>
> SDC was considered a flagship end-application for Wikidata.  If the data
> in Wikidata is not usable enough to supply the dogfood that project was
> expected to be going to be relying on, that should be a serious wake-up
> call, a red flag we should not ignore.
>
> If the way data is organised across different subjects is currently too
> inconsistent and confusing to be usable by our own SDC project, are
> there actions we can take to address that?  Are there design principles
> to be chosen that then need to be applied consistently?  Is this
> something the community can do, or is some more active direction going
> to need to be applied?
>
> Wikidata's 'ontology' has grown haphazardly, with little oversight, like
> an untended bank of weeds.  Is some more active gardening now required?
>
>-- James.
>
>
>
> ---
> This email has been checked for viruses by 

Re: [Wikidata] Help us teaching ORES how to better detect vandalism

2018-10-18 Thread Ettore RIZZA
Hello Léa,

This is an extremely useful tool. Just a detail to improve its usability:
could you remove the confirmation popup when skipping a modification? Many
edits are made in languages that everyone doesn't speak, we would like to
move faster to the next.

Cheers,

Ettore Rizza

On Thu, 18 Oct 2018 at 10:10, Léa Lacroix  wrote:

> Hello all,
> Just a reminder, we still need your help to complete this campaign! A few
> minutes of your time can really help ORES to be smarter in detecting
> vandalism. Thanks a lot :)
>
> Cheers, Léa
>
> On Wed, 18 Jul 2018 at 14:28, Léa Lacroix 
> wrote:
>
>> Hello all,
>>
>> As you may know, ORES is a tool analyzing edits to detect vandalism,
>> providing a score per edit. You can see the result on Recent Changes, you
>> can also let us know when you find something wrong
>> <https://www.wikidata.org/wiki/Wikidata:ORES/Report_mistakes/2018>.
>>
>> But do you know that you can also directly help ORES to improve? We just
>> launched a new labeling campaign
>> <https://labels.wmflabs.org/ui/wikidatawiki/>: after authorizing your
>> account with OAuth, you will see some real edits, and you will be asked if
>> you find them damaging or not, good faith or bad faith. Completing a set
>> will take you around 10 minutes.
>>
>>
>> ​
>>
>> The last time we run this campaign was in 2015. Since then, the way of
>> editing Wikidata changed, some vandalism patterns as well (for example,
>> there are more vandalism on companies). So, if you're familiar with the
>> Wikidata rules and you would be willing to give a bit of time to help
>> fighting against vandalism, please participate
>> <https://labels.wmflabs.org/ui/wikidatawiki/> :)
>>
>> If you encounter any problem or have question about the tool, feel free
>> to contact Ladsgroup <https://www.wikidata.org/wiki/User:Ladsgroup>.
>>
>> Cheers,
>> --
>> Léa Lacroix
>> Project Manager Community Communication for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Ettore RIZZA
> what computer did you use for this? IIRC it required >512GB of RAM to
function.

Hello Laura,

Sorry for my confusing message, I am not at all a member of the HDT team.
But according to its creator
, 100 GB "with an
optimized code" could be enough to produce an HDT like that.

On Mon, 1 Oct 2018 at 18:59, Laura Morales  wrote:

> > a new dump of Wikidata in HDT (with index) is available[
> http://www.rdfhdt.org/datasets/].
>
> Thank you very much! Keep it up!
> Out of curiosity, what computer did you use for this? IIRC it required
> >512GB of RAM to function.
>
> > You will see how Wikidata has become huge compared to other datasets. it
> contains about twice the limit of 4B triples discussed above.
>
> There is a 64-bit version of HDT that doesn't have this limitation of 4B
> triples.
>
> > In this regard, what is in 2018 the most user friendly way to use this
> format?
>
> Speaking for me at least, Fuseki with a HDT store. But I know there are
> also some CLI tools from the HDT folks.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Ettore RIZZA
Hello,

a new dump of Wikidata in HDT (with index) is available
. You will see how Wikidata has become
huge compared to other datasets. it contains about twice the limit of 4B
triples discussed above.

In this regard, what is in 2018 the most user friendly way to use this
format?

BR,

Ettore

On Tue, 7 Nov 2017 at 15:33, Ghislain ATEMEZING <
ghislain.atemez...@gmail.com> wrote:

> Hi Jeremie,
>
> Thanks for this info.
>
> In the meantime, what about making chunks of 3.5Bio triples (or any size
> less than 4Bio) and a script to convert the dataset? Would that be possible
> ?
>
>
>
> Best,
>
> Ghislain
>
>
>
> Provenance : Courrier 
> pour Windows 10
>
>
>
> *De : *Jérémie Roquet 
> *Envoyé le :*mardi 7 novembre 2017 15:25
> *À : *Discussion list for the Wikidata project.
> 
> *Objet :*Re: [Wikidata] Wikidata HDT dump
>
>
>
> Hi everyone,
>
>
>
> I'm afraid the current implementation of HDT is not ready to handle
>
> more than 4 billions triples as it is limited to 32 bit indexes. I've
>
> opened an issue upstream: https://github.com/rdfhdt/hdt-cpp/issues/135
>
>
>
> Until this is addressed, don't waste your time trying to convert the
>
> entire Wikidata to HDT: it can't work.
>
>
>
> --
>
> Jérémie
>
>
>
> ___
>
> Wikidata mailing list
>
> Wikidata@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-09-29 Thread Ettore RIZZA
Hi Thad,

I understand that an open Wiki has its advantages and disadvantages (I
sometimes prefer a system like StackOverflow, where you need a certain
reputation to do some things). I am afraid that a voting system simply
favors the opinions shared by the majority of Wikidata editors, namely a
Western worldview. And even within this subgroup opinions may legitimately
differ.

But there may be ways to avoid messing up the ontology while respecting the
wiki spirit. For example, a warning pop-up every time you edit an
ontological property (P31, P279, P361...). Something like: "OK, you added
the statement "a poodle is an instance of toy". Do you agree with the fact
that poodle is now a goods, a work, an artificial physical object? "

But that would only work for manual edits...

On Sat, 29 Sep 2018 at 16:38, Thad Guidry  wrote:

> Ettore,
>
> Wikidata has the ability of crowdsourcing...unfortunately, it is not
> effectively utilized.
>
> Its because Wikidata does not yet provide a voting feature on
> statements...where as the vote gets higher...more resistance to change the
> statement is required.
> But that breaks the notion of a "wiki" for some folks.
> And there we circle back to Gerard's age old question of ... should
> Wikidata really be considered a wiki at all for the benefit of society ?
> or should it apply voting/resistance to keep it tidy, factual and less
> messy.
>
> We have the technology to implement voting/resistance on statements.  I
> personally would utilize that feature and many others probably would as
> well.  Crowdsourcing the low voted facts back to applications like
> OpenRefine, or the recently sent out Survey vote mechanism for spam
> analysis on the low voted statements could highlight where things are
> untidy and implement vote casting to clean them up.
>
> "...the burden of proof has to be placed on authority, and it should be
> dismantled if that burden cannot be met..."
>
> -Thad
> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>
>
> On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA 
> wrote:
>
>> Hi,
>>
>> The Wikidata's ontology is a mess, and I do not see how it could be
>> otherwise. While the creation of new properties is controlled, any fool can
>> decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer a
>> human or is part of family. Maybe I'm a fool too? I wanted to remove the
>> claim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance
>> of "ship type" because it produces weird circular inferences in my
>> application; but maybe that makes sense to someone else.
>>
>> There will never be a universal ontology on which everyone agrees. I
>> wonder (sorry to think aloud) if Wikidata should not rather facilitate the
>> use of external classifications. Many external ids are knowledge
>> organization systems (ontologies, thesauri, classifications ...) I dream of
>> a simple query that could search, in Wikidata, "all elements of the same
>> class as 'poodle' according to the classification of imagenet
>> <http://imagenet.stanford.edu/synset?wnid=n02113335>.
>>
>> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Looking for "data quality check" bots

2018-09-29 Thread Ettore RIZZA
Hi Maarten,

Thank you very much for your answer and your pointers. The page (which I
did not know existed) containing a federated SPARQL query is definitely
close to what I mean. It just misses one more step: deciding who is right.
If we look at the first result of the table
<https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches> of
mismatches (Dmitry Bortniansky <https://www.wikidata.org/wiki/Q316505>) and
we draw a little graph, the result is:

[image: Diagram.png]

We can see that the error comes (probably) from Viaf, which contains a
duplicate, and from NTA, which obviously created an authority based on this
bad Viaf ID.

My research is very close to this kind of case, and I am very interested to
know what is already implemented in Wikidata.

Cheers,

Ettore Rizza

On Sat, 29 Sep 2018 at 13:03, Maarten Dammers  wrote:

> Hi Ettore,
>
>
> On 26-09-18 14:31, Ettore RIZZA wrote:
> > Dear all,
> >
> > I'm looking for Wikidata bots that perform accuracy audits. For
> > example, comparing the birth dates of persons with the same date
> > indicated in databases linked to the item by an external-id.
> Let's have a look at the evolution of automated editing. The first step
> is to add missing data from anywhere. Bots importing date of birth are
> an example of this. The next step is to add data from somewhere with a
> source or add sources to existing unsourced or badly sourced statements.
> As far as I can see that's where we are right now, see for example edits
> like
>
> https://www.wikidata.org/w/index.php?title=Q41264=revision=619653838=616277912
> is . Of course the next step would be to be able to compare existing
> sourced statements with external data to find differences. But how would
> the work flow be? Take for example Johannes Vermeer (
> https://www.wikidata.org/wiki/Q41264 ). Extremely well documented and
> researched, but
>
> http://www.getty.edu/vow/ULANFullDisplay?find500032927
> and https://rkd.nl/nl/explore/artists/80476 combined provide 3 different
> dates of birth and 3 different dates of death. When it comes to these
> kind of date mismatches, it's generally first come, first served (first
> date added doesn't get replaced). This mismatch could show up in some
> report. I can check it as a human and maybe do some adjustments, but how
> would I sign it of to prevent other people from doing the same thing
> over and over again?
>
> With federated SPARQL queries it becomes much easier to generate reports
> of mismatches. See for example
> https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches .
>
> Maarten
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-09-29 Thread Ettore RIZZA
Hi,

The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool can
decide that a woman is no longer a
human or is part of family. Maybe I'm a fool too? I wanted to remove the
claim that a ship  is an instance of
"ship type" because it produces weird circular inferences in my
application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder
(sorry to think aloud) if Wikidata should not rather facilitate the use of
external classifications. Many external ids are knowledge organization
systems (ontologies, thesauri, classifications ...) I dream of a simple
query that could search, in Wikidata, "all elements of the same class as
'poodle' according to the classification of imagenet
.

On Fri, 28 Sep 2018 at 04:42, Thad Guidry  wrote:

> James,
>
> It looks like a lot of that phabricator issue was around Taxons ?  For the
> Poodle to show a class of Mammal...
>
> Seems like many of these could be answered if someone responded to
> https://www.wikidata.org/wiki/User:Danyaljj on their last question about
> if an "OR" could be used with linktype with gas:service ... where no one
> gave an answer to their final question comment here:
>
> https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeout_when_finding_distance_between_two_entities
>
> I tried myself to answer that question and find either Parent Taxon OR
> Subclass of a Poodle, but couldn't seem to pull it off using gas:service
> and 1 hour of trial and error in many forms, even duplicating the program
> twice ...
>
> http://tinyurl.com/yb7wfpwh
>
> #defaultView:Graph
> PREFIX gas: 
>
> SELECT ?item ?itemLabel
> WHERE {
>   SERVICE gas:service {
> gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
> gas:in wd:Q38904 ;
> gas:traversalDirection "Forward" ;
> gas:out ?item ;
> gas:out1 ?depth ;
> gas:maxIterations 10 ;
> gas:linkType wdt:P279 .
>   }
>   SERVICE gas:service {
> gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
> gas:in wd:Q38904 ;
> gas:traversalDirection "Forward" ;
> gas:out ?item ;
> gas:out1 ?depth ;
> gas:maxIterations 10 ;
> gas:linkType wdt:P171 .
>   }
>
>   SERVICE wikibase:label {bd:serviceParam wikibase:language
> "[AUTO_LANGUAGE],en" }
> }
>
>
> On Thu, Sep 27, 2018 at 5:24 PM Stas Malyshev 
> wrote:
>
>> Hi!
>>
>> > Apparently the Wikidata hierarchies were simply too complicated, too
>> > unpredictable, and too arbitrary and inconsistent in their design across
>> > different subject areas to be readily assimilated (before one even
>> > starts on the density of bugs and glitches that then undermine them).
>>
>> The main problem is that there is no standard way (or even defined small
>> number of ways) to get the hierarchy that is relevant for "depicts" from
>> current Wikidata data. It may even be that for a specific type or class
>> the hierarchy is well defined, but the sheer number of different ways it
>> is done in different areas is overwhelming and ill-suited for automatic
>> processing. Of course things like "is "cat" a common name of an animal
>> or a taxon and which one of these will be used in depicts" adds
>> complexity too.
>>
>> One way of solving it is to create a special hierarchy for "depicts"
>> purposes that would serve this particular use case. Another way is to
>> amend existing hierarchies and meta-hierarchies so that there would be
>> an algorithmic way of navigating them in a common case. This is
>> something that would be nice to hear about from people that are
>> experienced in ontology creation and maintenance.
>>
>> > to be chosen that then need to be applied consistently?  Is this
>> > something the community can do, or is some more active direction going
>> > to need to be applied?
>>
>> I think this is very much something that the community can do.
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Looking for "data quality check" bots

2018-09-26 Thread Ettore RIZZA
Hi,

Wikidata is obviously linked to a bunch of unusable external ids, but also
to some very structured data. I'm interested for the moment in the state of
the art - even based on poor scraping, why not?.

I see for example this request for permission
<https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Symac_bot_4>
for a bot able to retrieve information on the BNF (French national library)
database. It has been refused because of copyright's issues, but simply
checking the information without extracting anything is allowed, isn't?

On Wed, 26 Sep 2018 at 20:48, Paul Houle  wrote:

> "Poorly structured" HTML is not all that bad in 2018 thanks to HTML 5
> (which builds the "rendering decisions made about broken HTML from
> Netscape 3" into the standard so that in common languages you can get
> the same DOM tree as the browser)
>
> If you try to use an official or unofficial API to fetch data from some
> service in 2018 you will have to add some dependencies and you just
> might open a can of whoop-ass that will make you reinstall Anconda or
> maybe you will learn something you'll never be able to unlearn about how
> XML processing changed between two minor versions of the JDK
>
> On the other hand I have often dusted off the old HTML-based parser I
> made for Flickr and found I could get it to work for other media
> collections,  blogs, etc. by just changing the "semantic model" embodied
> in the application which could be as simple as some function or object
> that knows something about the structure of the URLs some documents.
>
> I cannot understand why so many standards have been pushed to integrate
> RDF and HTML that have gone nowhere but nobody has promoted the clean
> solution of "add a css media type for RDF" that marks the semantics of
> HTML up the way JSON-LD works.
>
> Often though if you look it that way much of the time these days
> matching patterns against CSS gets you most of the way there.
>
> I've had cases where I haven't had to change the rule sets much at all
> but none of them have been more than 50 lines of code,  all much less.
>
>
>
> -- Original Message --
> From: "Federico Leva (Nemo)" 
> To: "Discussion list for the Wikidata project"
> ; "Ettore RIZZA" 
> Sent: 9/26/2018 1:00:53 PM
> Subject: Re: [Wikidata] Looking for "data quality check" bots
>
> >Ettore RIZZA, 26/09/2018 15:31:
> >>I'm looking for Wikidata bots that perform accuracy audits. For
> >>example, comparing the birth dates of persons with the same date
> >>indicated in databases linked to the item by an external-id.
> >
> >This is mostly a screenscraping job, because most external databases
> >are only accessibly in unstructured or poorly structured HTML form.
> >
> >Federico
> >
> >___
> >Wikidata mailing list
> >Wikidata@lists.wikimedia.org
> >https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Looking for "data quality check" bots

2018-09-26 Thread Ettore RIZZA
Dear all,

I'm looking for Wikidata bots that perform accuracy audits. For example,
comparing the birth dates of persons with the same date indicated in
databases linked to the item by an external-id.

I do not even know if they exist. Bots are often poorly documented, so I
appeal to the community to get some example.

Many thanks.

Ettore Rizza
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mapping back to Schema.org needs "broader external class"

2018-09-26 Thread Ettore RIZZA
By the way: the item "demand <https://www.wikidata.org/wiki/Q4402708>" in
Wikidata clearly refers to the economic concept in its broadest and most
abstract sense, while https://schema.org/Demand is defined as "the public
(...) announcement by an organization or person to seek certain types of
goods or services. "

So I would not say they are equivalent classes, but rather something like <
https://www.wikidata.org/wiki/Q4402708>  <
https://schema.org/Demand>.

As Andra Waagmeetser indicates, could not this be modeled with an external
ID?

On Wed, 26 Sep 2018 at 10:18, Ettore RIZZA  wrote:

> Hi,
>
> aggregate demand -- broader external class --> https://schema.org/Demand
>> place of devotion -- broader external class --> https://schema.org/Place
>> festival -- broader external class --> https://schema.org/Event
>
>
>  According to the "creator" of the property narrower external class
> (P3950)
> <https://www.wikidata.org/wiki/Wikidata:Property_proposal/external_subclass>,
> " the reverse (...) is less required because more general classes are
> more likely to be included in Wikidata anyway.  "
>
> These examples seem to prove him right, since "demand
> <https://www.wikidata.org/wiki/Q4402708>" exists in Wikidata and is
> already linked to "http://schema.org/Demand; via the equivalent class
> property. Same thing for https://schema.org/Event, already mapped with
> event <https://www.wikidata.org/wiki/Q1656682>, or for
> https://schema.org/Place , which could be associated with location
> <https://www.wikidata.org/wiki/Q17334923> (not sure).
>
> To be clear, I support the creation of "broader external class" because it
> can be used with some external vocabularies; I point this out just to make
> sure that all existing mapping possibilities are used. :)
>
> Cheers,
>
> Ettore Rizza
>
>
> On Wed, 26 Sep 2018 at 03:53, Thad Guidry  wrote:
>
>> Sure, Dan
>>
>> aggregate demand <https://www.wikidata.org/wiki/Q1801078> -- broader
>> external class --> https://schema.org/Demand
>>
>> place of devotion <https://www.wikidata.org/wiki/Property:P5873> --
>> broader external class --> https://schema.org/Place
>>
>> festival <https://www.wikidata.org/wiki/Q132241> -- broader external
>> class --> https://schema.org/Event
>>
>> Usually we can discover these relationships quite easily with "What links
>> here" on the GUI and applicable SPARQL queries, but then would like to
>> apply the Wikidata->Schema.org mappings when we discover those
>> relationships can be made.  I suck at PHP, so I couldn't build or
>> contribute to a native application for Wikidata to host that application to
>> auto discover some of these mappings, but would be happy to assist someone
>> who could code in PHP to build such application...here's looking at you,
>> Magnus ?  :-)
>>
>> -Thad
>> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>>
>>
>> On Tue, Sep 25, 2018 at 7:07 PM Dan Brickley  wrote:
>>
>>> On Tue, 25 Sep 2018 at 16:35, Thad Guidry  wrote:
>>>
>>>> Hi Team !
>>>> +Dan Brickley  +Lydia Pintscher
>>>> 
>>>>
>>>> Schema.org mapping is progressing on every new Weekly Summary "Newest
>>>> properties" listing.
>>>> That's great !  And thanks to Léa and team for providing the new
>>>> properties listing !
>>>>
>>>> What's not great, is many times, we cannot apply a "broader external
>>>> class" to map to a Schema.org Type.  This is because "broader concept"
>>>> https://www.wikidata.org/wiki/Property:P4900 is constrained to
>>>> "qualifiers only and not for use on statements".
>>>>
>>>> We are able to use the existing "narrower external class"
>>>> <https://www.wikidata.org/wiki/Property:P3950> , for example like here
>>>> on this topic, https://www.wikidata.org/wiki/Q7406919 , but there is
>>>> no "broader external class" property in Wikidata yet from what we see.
>>>>
>>>> It would be *awesome* if someone could advocate for that new property
>>>> to help map Wikidata to external vocabularies that have broader concepts
>>>> quite often, such as Schema.org.
>>>>
>>>
>>> Could you give 2-3 specific examples, to help motivate the request, for
>>> folk who're not tracking this work?
>>>
>>> Dan
>>>
>>> -Thad
>>>> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>>>>
>>>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mapping back to Schema.org needs "broader external class"

2018-09-26 Thread Ettore RIZZA
Hi,

aggregate demand -- broader external class --> https://schema.org/Demand
> place of devotion -- broader external class --> https://schema.org/Place
> festival -- broader external class --> https://schema.org/Event


 According to the "creator" of the property narrower external class (P3950)
<https://www.wikidata.org/wiki/Wikidata:Property_proposal/external_subclass>,
" the reverse (...) is less required because more general classes are more
likely to be included in Wikidata anyway.  "

These examples seem to prove him right, since "demand
<https://www.wikidata.org/wiki/Q4402708>" exists in Wikidata and is already
linked to "http://schema.org/Demand; via the equivalent class property.
Same thing for https://schema.org/Event, already mapped with event
<https://www.wikidata.org/wiki/Q1656682>, or for  https://schema.org/Place
, which could be associated with location
<https://www.wikidata.org/wiki/Q17334923> (not sure).

To be clear, I support the creation of "broader external class" because it
can be used with some external vocabularies; I point this out just to make
sure that all existing mapping possibilities are used. :)

Cheers,

Ettore Rizza


On Wed, 26 Sep 2018 at 03:53, Thad Guidry  wrote:

> Sure, Dan
>
> aggregate demand <https://www.wikidata.org/wiki/Q1801078> -- broader
> external class --> https://schema.org/Demand
>
> place of devotion <https://www.wikidata.org/wiki/Property:P5873> --
> broader external class --> https://schema.org/Place
>
> festival <https://www.wikidata.org/wiki/Q132241> -- broader external
> class --> https://schema.org/Event
>
> Usually we can discover these relationships quite easily with "What links
> here" on the GUI and applicable SPARQL queries, but then would like to
> apply the Wikidata->Schema.org mappings when we discover those
> relationships can be made.  I suck at PHP, so I couldn't build or
> contribute to a native application for Wikidata to host that application to
> auto discover some of these mappings, but would be happy to assist someone
> who could code in PHP to build such application...here's looking at you,
> Magnus ?  :-)
>
> -Thad
> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>
>
> On Tue, Sep 25, 2018 at 7:07 PM Dan Brickley  wrote:
>
>> On Tue, 25 Sep 2018 at 16:35, Thad Guidry  wrote:
>>
>>> Hi Team !
>>> +Dan Brickley  +Lydia Pintscher
>>> 
>>>
>>> Schema.org mapping is progressing on every new Weekly Summary "Newest
>>> properties" listing.
>>> That's great !  And thanks to Léa and team for providing the new
>>> properties listing !
>>>
>>> What's not great, is many times, we cannot apply a "broader external
>>> class" to map to a Schema.org Type.  This is because "broader concept"
>>> https://www.wikidata.org/wiki/Property:P4900 is constrained to
>>> "qualifiers only and not for use on statements".
>>>
>>> We are able to use the existing "narrower external class"
>>> <https://www.wikidata.org/wiki/Property:P3950> , for example like here
>>> on this topic, https://www.wikidata.org/wiki/Q7406919 , but there is no
>>> "broader external class" property in Wikidata yet from what we see.
>>>
>>> It would be *awesome* if someone could advocate for that new property
>>> to help map Wikidata to external vocabularies that have broader concepts
>>> quite often, such as Schema.org.
>>>
>>
>> Could you give 2-3 specific examples, to help motivate the request, for
>> folk who're not tracking this work?
>>
>> Dan
>>
>> -Thad
>>> +ThadGuidry <https://plus.google.com/+ThadGuidry>
>>>
>>> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mapping Wikidata to other ontologies

2018-09-22 Thread Ettore RIZZA
@Andra Waagmester: I am a little disconcerted by the property P288 "exact
match" . I see it is mostly
used to link entities, not properties, and I can't figure out how it
differs from an external id (unless it's just a convenient way of linking
concepts to databases that do not have an external id in Wikidata?)



On Sat, 22 Sep 2018 at 15:55, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> It is indeed helpful to link the Wikidata ontologies to other ontologies,
> particularly ones like the DBpedia ontology and the schema.org ontology.
> There are already quite a few links from the Wikidata ontology to several
> other ontologies, using the Wikidata equivalent class and property
> properties.
>  Going through and ensuring that every class and property, for example, in
> the
> DBpedia ontology or the schema.org ontology is the target of a correct (!)
> link would be useful.   Then, as you indicate, it is not so hard to query
> Wikidata using the external ontology or map Wikidata information into
> information in the other ontology.
>
>
> The Wikidata ontology is much larger (almost two million classes) and much
> more fine grained than most (or maybe even all) other general-purpose
> ontologies.  This is appealing as one can be much more precise in Wikidata
> than in other ontologies.  It does make Wikidata harder to use (correctly)
> because to represent an entity in Wikidata one has to select among many
> more
> alternatives.
>
> This selection is harder than it should be.  The Wikidata ontology is not
> well
> organized.  The Wikidata ontology has errors in it.  There is not yet a
> good
> tool for visualizing or exploring the ontology (although there are some
> useful
> tools such as https://tools.wmflabs.org/bambots/WikidataClasses.php and
> http://tools.wmflabs.org/wikidata-todo/tree.html).
>
> So it is not trivial to set up good mappings from the Wikidata ontology to
> other ontologies.   When setting up equivalences one has to be careful to
> select the Wikidata class or property that is actually equivalent to the
> external class or property as opposed to a Wikidata class or property that
> just happens to have a similar or the same label.  One also has to be
> similarly careful when setting up other relationships between the Wikidata
> ontology and other ontologies.   As well, one has to be careful to select
> good
> relationships that have well-defined meanings.  (Some SKOS relationships
> are
> particuarly suspect.)  I suggest using only strict generalization and
> specialization relationships.
>
>
> So I think that an effort to completely and correctly map several external
> general-purpose ontologies into the Wikidata ontology would be something
> for
> the Wikidata community to support.  Pick a few good external ontologies and
> put the needed effort into adding any missing mappings and checking the
> mappings that already exist.   Get someone or some group to commit to
> keeping
> the mapping up to date.  Announce the results and show how they are useful.
>
>
> Peter F. Patel-Schneider
> Nuance Communications
>
>
> On 9/22/18 4:28 AM, Maarten Dammers wrote:
> > Hi everyone,
> >
> > Last week I presented Wikidata at the Semantics conference in Vienna (
> > https://2018.semantics.cc/ ). One question I asked people was: What is
> keeping
> > you from using Wikidata? One of the common responses is that it's quite
> hard
> > to combine Wikidata with the rest of the semantic web. We have our own
> private
> > ontology that's a bit on an island. Most of our triples are in our own
> private
> > format and not available in a more generic, more widely use ontology.
> >
> > Let's pick an example: Claude Lussan. No clue who he is, but my bot
> seems to
> > have added some links and the item isn't too big. Our URI is
> > http://www.wikidata.org/entity/Q2977729 and this is equivalent of
> > http://viaf.org/viaf/29578396 and
> > http://data.bibliotheken.nl/id/thes/p173983111 . If you look at
> > http://www.wikidata.org/entity/Q2977729.rdf this equivalence is
> represented as:
> > http://viaf.org/viaf/29578396"/>
> > http://data.bibliotheken.nl/id/thes/p173983111
> "/>
> >
> > Also outputting it in a more generic way would probably make using it
> easier
> > than it is right now. Last discussion about this was at
> > https://www.wikidata.org/wiki/Property_talk:P1921 , but no response
> since June.
> >
> > That's one way of linking up, but another way is using equivalent
> property (
> > https://www.wikidata.org/wiki/Property:P1628 ) and equivalent class (
> > https://www.wikidata.org/wiki/Property:P1709 ). See for example sex or
> gender
> > ( https://www.wikidata.org/wiki/Property:P21) how it's mapped to other
> > ontologies. This won't produce easier RDF, but some smart downstream
> users
> > have figured out some SPARQL queries. So linking up our properties and
> classes
> > to other ontologies will make using our data easier. This is a first
> 

Re: [Wikidata] Mapping Wikidata to other ontologies

2018-09-22 Thread Ettore RIZZA
Hi,

I fully agree on the usefulness of this mapping.

Out of 5311 properties, only 232 have equivalents in other schemes
<https://query.wikidata.org/#%23list%20of%20properties%20in%20Wikidata%20with%20their%20type%20and%20their%20equivalent%20in%20other%20ontologies%0A%0ASELECT%20DISTINCT%20%3Fproperty%20%3FpropertyLabel%20%3FpropertyDescription%20%3FpropertyType%20%0A%28GROUP_CONCAT%28DISTINCT%20%3FequivalentProp%3Bseparator%3D%22%3B%20%22%29%20as%20%3FequivalentProps%29%0A%0AWHERE%0A%7B%0A%20%20%20%20%3Fproperty%20rdf%3Atype%20wikibase%3AProperty%20.%0A%20%20%20%20%3Fproperty%20wikibase%3ApropertyType%20%3FpropertyType%20.%0A%20%20%20%20OPTIONAL%20%7B%3Fproperty%20wdt%3AP1628%20%20%3FequivalentProp%20.%7D%0A%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%09%0A%20%20%0A%7D%20GROUP%20BY%20%3Fproperty%20%3FpropertyLabel%20%3FpropertyDescription%20%3FpropertyType%20>
(although
the many external ids are special cases since they are equivalent to some
kind of owl:sameAs.)

If I can help in this job, I'm interested.

Cheers,

Ettore Rizza

On Sat, 22 Sep 2018 at 13:29, Maarten Dammers  wrote:

> Hi everyone,
>
> Last week I presented Wikidata at the Semantics conference in Vienna (
> https://2018.semantics.cc/ ). One question I asked people was: What is
> keeping you from using Wikidata? One of the common responses is that
> it's quite hard to combine Wikidata with the rest of the semantic web.
> We have our own private ontology that's a bit on an island. Most of our
> triples are in our own private format and not available in a more
> generic, more widely use ontology.
>
> Let's pick an example: Claude Lussan. No clue who he is, but my bot
> seems to have added some links and the item isn't too big. Our URI is
> http://www.wikidata.org/entity/Q2977729 and this is equivalent of
> http://viaf.org/viaf/29578396 and
> http://data.bibliotheken.nl/id/thes/p173983111 . If you look at
> http://www.wikidata.org/entity/Q2977729.rdf this equivalence is
> represented as:
> http://viaf.org/viaf/29578396"/>
> http://data.bibliotheken.nl/id/thes/p173983111
> "/>
>
> Also outputting it in a more generic way would probably make using it
> easier than it is right now. Last discussion about this was at
> https://www.wikidata.org/wiki/Property_talk:P1921 , but no response
> since June.
>
> That's one way of linking up, but another way is using equivalent
> property ( https://www.wikidata.org/wiki/Property:P1628 ) and equivalent
> class ( https://www.wikidata.org/wiki/Property:P1709 ). See for example
> sex or gender ( https://www.wikidata.org/wiki/Property:P21) how it's
> mapped to other ontologies. This won't produce easier RDF, but some
> smart downstream users have figured out some SPARQL queries. So linking
> up our properties and classes to other ontologies will make using our
> data easier. This is a first step. Maybe it will be used in the future
> to generate more RDF, maybe not and we'll just document the SPARQL
> approach properly.
>
> The equivalent property and equivalent class are used, but not that
> much. Did anyone already try a structured approach with reporting? I'm
> considering parsing popular ontology descriptions and producing reports
> of what is linked to what so it's easy to make missing links, but I
> don't want to do double work here.
>
> What ontologies are important because these are used a lot? Some of the
> ones I came across:
> * https://www.w3.org/2009/08/skos-reference/skos.html
> * http://xmlns.com/foaf/spec/
> * http://schema.org/
> * https://creativecommons.org/ns
> * http://dbpedia.org/ontology/
> * http://vocab.org/open/
> Any suggestions?
>
> Maarten
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Number of properties

2018-09-21 Thread Ettore RIZZA
And, of course, you can also use a SPARQL Query

.

On Fri, 21 Sep 2018 at 21:07, Michael Schönitzer <
michael.schoenit...@wikimedia.de> wrote:

> And here you can see growth over time:
>
> https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel?refresh=30m=3=1=now-2y=now
>
>
> Am Fr., 21. Sep. 2018 um 20:44 Uhr schrieb Nicolas VIGNERON <
> vigneron.nico...@gmail.com>:
>
>> Hi,
>>
>> You have this list :
>> https://www.wikidata.org/wiki/Special:AllPages/Property: (automatically
>> generated)
>> and this list : https://www.wikidata.org/wiki/Wikidata:List_of_properties
>> (organised and curated by contributors).
>>
>> Cheer, ~nicolas
>>
>> Le ven. 21 sept. 2018 à 20:22, Adrian Bielefeldt <
>> adrian.bielefe...@mailbox.tu-dresden.de> a écrit :
>>
>>> Hello everyone,
>>>
>>> is there some web page that states how many properties exist in Wikidata?
>>>
>>> Greetings,
>>>
>>> Adrian
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
> Michael F. Schönitzer
>
>
>
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. (030) 219 158 26-0
> http://wikimedia.de
>
> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> Wissens frei teilhaben kann. Helfen Sie uns dabei!
> http://spenden.wikimedia.de/
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata in the LOD Cloud

2018-07-28 Thread Ettore RIZZA
Dear all,

stop me if my question is naive or stupid. But I see that a dataset like
Europeana is both in the Lod Cloud and as a property in Wikidata
. However, the method using
the "Formatter URL for RDF resource" property does not work because this
property is missing from Europeana ID. How many other cases like this?

But I see in this simplified version of the Lod Cloud
 that each dataset has a namespace. Would
not it be more efficient to match Wikidata and Lod Cloud using this
namespaces in a series of Sparql queries ?

Cheers,

Ettore

On Mon, 9 Jul 2018 at 14:07, Lucas Werkmeister 
wrote:

> On 27.06.2018 22:40, Federico Leva (Nemo) wrote:
> > Maarten Dammers, 27/06/2018 23:26:
> >> Excellent news! https://lod-cloud.net/dataset/wikidata seems to
> >> contain the info in a more human readable (and machine readable) way.
> >> If we add some URI link, does it automagically appear or does Lucas
> >> has to do some manual work? I assume Lucas has to do some manual work.
> >
> > I'd also be curious what to do when a property does not have a node in
> > the LOD cloud, for instance P2948 is among the 77 results for P1921
> > but I don't see any corresponding URL in
> > http://lod-cloud.net/versions/2018-30-05/lod-data.json
>
> Previously it was manual work, yes, and for properties not in the LOD
> cloud I added commented-out entries to the page source of
> https://www.wikidata.org/wiki/User:Lucas_Werkmeister_(WMDE)/LOD_Cloud.
> I’ll try to resubmit Wikidata now and see how the submission process has
> evolved.
>
> Cheers, Lucas
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] NCI Thesaurus ID links not working

2018-07-18 Thread Ettore RIZZA
Hi Thad,

This ID is a clickable link when I look at the page on my laptop, but not
when using the mobile version  (https://m.wikidata.org/wiki/Q1931388)

Could it be because this identifier was put in the wrong place (in the
"statements" block and not in the "identifiers" block) ?

Cheers,

Ettore

On Thu, 19 Jul 2018 at 04:05, Thad Guidry  wrote:

> Hello esteemed team !
>
> Someone noted on Property Talk about
> https://www.wikidata.org/wiki/Property_talk:P1748
>
> that an external id should be used ?
>
> I added just a few codes to test what is happening, and the CODE on the
> left does populate, like this one for Death
> https://www.wikidata.org/wiki/Q1931388 it shows my newly added C28554 on
> the left under External properties, but yeah, it is not a clickable link.
>
> The Formatter URL does indeed work when populated with that C28554
> code...so...
>
> What could be the issue with the CODE not displaying with the Formatter
> URL and being clickable ?
>
> -Thad
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-06 Thread Ettore RIZZA
@Antonin : You're right, I now remember Magnus Knuth's message on this list
about GlobalFactSync
<https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync>, a
lite version of CrossWikiFact, if I understood correctly. I also remember
that his message did not trigger many reactions...

2018-05-06 10:46 GMT+02:00 Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu>:

> On 06/05/2018 10:37, Ettore RIZZA wrote:
> > More simply, there's still a long way to go until Wikidata imports
> > all the data contained in Wikipedia infoboxes (or equivalent data
> > from other sources), let alone the rest.
> >
> >
> > This surprises me. Are there any statistics somewhere on the rate of
> > Wikipedia's infoboxes fully parsed ?
>
>
> That was more or less the goal of the CrossWikiFact project, which was
> unfortunately not very widely supported:
> https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/CrossWikiFact
>
> It's still not clear to me why this got so little support - it looked
> like a good opportunity to collaborate with DBpedia.
>
> Antonin
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-06 Thread Ettore RIZZA
>
> More simply, there's still a long way to go until Wikidata imports all the
> data contained in Wikipedia infoboxes (or equivalent data from other
> sources), let alone the rest.


This surprises me. Are there any statistics somewhere on the rate of
Wikipedia's infoboxes fully parsed ?

2018-05-05 19:04 GMT+02:00 Federico Leva (Nemo) :

> Andy Mabbett, 05/05/2018 17:33:
>
>> Both Wikidata and DBpedia surely can, and should, coexist because we'll
>>> never be able to host in Wikidata the entirety of the Wikipedias.
>>>
>> Can you give an example of something that can be represented in
>> DBpedia, but not Wikidata?
>>
>
> More simply, there's still a long way to go until Wikidata imports all the
> data contained in Wikipedia infoboxes (or equivalent data from other
> sources), let alone the rest.
>
> So, as Gerard mentions, DBpedia has something more/different to offer.
> (The same is true for the various extractions of structured data from
> Wiktionary vs. Wiktionary's own unstructured data.)
>
> That said, the LOD cloud is about links, as far as I understand. Wikidata
> should be very interesting in it.
>
> Federico
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-05 Thread Ettore RIZZA
>
> The semantics of Wikidata qualifiers have not been defined and and won't
> be enforced. It's left up to users to invent their own meanings. (In this
> way, Wikidata is still a lot like the prose in Wikipedia.)
> We need more "curated" projects like DBpedia



Mmh, I would have rather thought that the system of qualifiers, even
imperfect, was a great enhacement compared to the DBpedia model - which is
a bit of a mess.

Let's take Winston Churchill item  :
Wikidata tells us, for example, that he served as British Prime Minister
from 1951 to 1955 replacing Clement Attlee and that he was replaced at this
position by Anthony Eden. In DBpedia
, which does not use
reification, we have just a list of offices, a list of successors, a list
of predecessors, a list of dates, and no way to figure out who replaced
whom to what and when.

The handcrafted ontology of DBpedia is certainly more consistent, but it's
also much poorer. Rather than impoverishing Wikidata's class system, would
it not be better to find a way to avoid horrors like "actor is a subclass
of person" . I would be
interested to know if there are researchers working on the subject.

Regarding the compared size of DBpedia and Wikidata, I thought that
Wikidata is by nature much larger. DBpedia cannot contain more entities
than there are in the english Wikipedia (about 5 million), with its very
strict criteria of notoriety, while Wikidata allows any more things. Am I
wrong ? (I consider of course that DBpedia and its other language versions
are different knowledge bases, as is the case in the LOD cloud)

2018-05-05 16:52 GMT+02:00 David Abián :

> I don't mean a technical lack of expressiveness, but the impossibility,
> and lack of intention, for Wikipedia to become a read-only interface of
> Wikidata someday.
>
>
> El 05/05/18 a las 16:33, Andy Mabbett escribió:
> > On 5 May 2018 at 14:39, David Abián  wrote:
> >
> >> Both Wikidata and DBpedia surely can, and should, coexist because we'll
> >> never be able to host in Wikidata the entirety of the Wikipedias.
> >
> > Can you give an example of something that can be represented in
> > DBpedia, but not Wikidata?
> >
>
> --
> David Abián
> Wikimedia España
> https://wikimedia.es/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-04-30 Thread Ettore RIZZA
>
> The suspense is killing me :D


Me too ! :D

Thanks Lydia, and Lucas of course, looking forward to see a big Wikidata
bubble in the middle of this cloud.

Cheers,

Ettore Rizza

2018-04-30 22:13 GMT+02:00 Lydia Pintscher <lydia.pintsc...@wikimedia.de>:

> On Mon, Apr 30, 2018 at 9:19 PM Ettore RIZZA <ettoreri...@gmail.com>
> wrote:
>
> > Hi all,
>
> > The new version of the "Linked Open data Cloud" graph  is out ... and
> still no Wikidata in it. According to this Twitter discussion, this would
> be due to a lack of metadata on Wikidata. No way to fix that easily? The
> LOD cloud is cited in many scientific papers, it is not a simple gadget.
>
> When I last talked to them about getting Wikidata included it wasn't
> possible because the website handling the datasets was changed and no
> longer worked for it. Seems they've changed that now. Lucas is in touch to
> figure out what's needed now. Let's hope we can finally get this solved now
> and see where Wikidata ends up in the cloud. The suspense is killing me :D
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikiata and the LOD cloud

2018-04-30 Thread Ettore RIZZA
Hi all,

The new version of the "Linked Open data Cloud" graph
<http://lod-cloud.net/>  is out ... and still no Wikidata in it. According
to this Twitter discussion
<https://twitter.com/AmrapaliZ/status/990927835400474626>, this would be
due to a lack of metadata on Wikidata. No way to fix that easily? The LOD
cloud is cited in many scientific papers, it is not a simple gadget.

Cheers,

Ettore Rizza
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Historical (RDF) dumps

2018-03-30 Thread Ettore RIZZA
Here is a better link.
<https://archive.org/details/wikimediadownloads?and%5B%5D=Wikidata+TTL===2>

2018-03-30 18:58 GMT+02:00 Ettore RIZZA <ettoreri...@gmail.com>:

> Hi Aidan,
>
> I think all the dumps are on archive.org
> <https://archive.org/details/wikimedia-other?and%5B%5D=wikidata%20dumps%20-title:(entity%20dumps)>
>  (never
> check if they are complete)
>
> Cheers,
>
> Ettore Rizza
>
> 2018-03-30 18:53 GMT+02:00 Aidan Hogan <aid...@gmail.com>:
>
>> Hi all,
>>
>> With a couple of students we are working on various topics relating to
>> the dynamics of RDF and Wikidata. The public dumps in RDF cover the past
>> couple of months:
>>
>> https://dumps.wikimedia.org/wikidatawiki/entities/
>>
>> I'm wondering is there a way to get access to older dumps or perhaps
>> generate them from available data? We've been collecting dumps but it seems
>> we have a gap for a dump on 2017/07/04 right in the middle of our
>> collection. :) (If anyone has a copy of the truthy data for that particular
>> month, I would be very grateful if they can reach out.)
>>
>> In general, I think it would be fantastic to have a way to access all
>> historical dumps. In particular, datasets might be used in papers and for
>> reproducibility purposes it would be lift a burden from the authors to be
>> able to link (rather than having to host) the data used. I am not sure if
>> such an archive is feasible or not though.
>>
>> Thanks,
>> Aidan
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Historical (RDF) dumps

2018-03-30 Thread Ettore RIZZA
Hi Aidan,

I think all the dumps are on archive.org
<https://archive.org/details/wikimedia-other?and%5B%5D=wikidata%20dumps%20-title:(entity%20dumps)>
(never
check if they are complete)

Cheers,

Ettore Rizza

2018-03-30 18:53 GMT+02:00 Aidan Hogan <aid...@gmail.com>:

> Hi all,
>
> With a couple of students we are working on various topics relating to the
> dynamics of RDF and Wikidata. The public dumps in RDF cover the past couple
> of months:
>
> https://dumps.wikimedia.org/wikidatawiki/entities/
>
> I'm wondering is there a way to get access to older dumps or perhaps
> generate them from available data? We've been collecting dumps but it seems
> we have a gap for a dump on 2017/07/04 right in the middle of our
> collection. :) (If anyone has a copy of the truthy data for that particular
> month, I would be very grateful if they can reach out.)
>
> In general, I think it would be fantastic to have a way to access all
> historical dumps. In particular, datasets might be used in papers and for
> reproducibility purposes it would be lift a burden from the authors to be
> able to link (rather than having to host) the data used. I am not sure if
> such an archive is feasible or not though.
>
> Thanks,
> Aidan
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Ettore RIZZA
>
> This flow involves human intervention so it is not instant. It can take
> years.


Arf, yes, this is unfortunately quite a plain and rational argument. Thank
you very much for your answer.

2018-03-06 20:26 GMT+01:00 Ettore RIZZA <ettoreri...@gmail.com>:

> If Dbpedia has “same as” then Wikidata doesn’t have to duplicate that
>> information you can ask dbpedia what is same as Q7724
>
>
>
> Sebastian Hellman beats me to my question. Perhaps our points of view are
> different. From mine, which is that of a data consumer and a network
> enthusiast, the difference between a simple link and two-way links is huge.
> I mean, links are cheap and linked data is just about links. Instead of a
> heavy, complex and often empty federated SPAQL query, it would be enough
> to ask Wikidata to get the information of DBpedia (I think Wikidata is
> intended to contain one day all the DBpedia entities). Just as one will be
> able to query Wikidata one day to know the VIAF ID of any writer. I believe
> that Wikidata is destined to become a data hub of this kind, but maybe I'm
> wrong.
>
> Best regards,
>
> Ettore Rizza
>
> 2018-03-06 20:11 GMT+01:00 Sebastian Hellmann <hellm...@informatik.uni-
> leipzig.de>:
>
>> Hm, now I am also curious and would like to ask the same question as
>> Ettore. What is the policy here?
>>
>> Viaf has schema.org backlinks, see https://viaf.org/viaf/85312226/rdf.xml
>>
>> http://www.wikidata.org/entity/Q80;
>> <http://www.wikidata.org/entity/Q80>/>
>> Then it's ok to duplicate? because it is not owl:sameAs?
>>
>> All the best,
>> Sebastain
>>
>>
>> On 06.03.2018 20:01, Magnus Sälgö wrote:
>>
>> If Dbpedia has “same as” then Wikidata doesn’t have to duplicate that
>> information you can ask dbpedia what is same as Q7724
>>
>> Regards
>> Magnus Sälgö
>> Stockholm, Sweden
>>
>> 6 mars 2018 kl. 19:49 skrev Ettore RIZZA <ettoreri...@gmail.com>:
>>
>> First of all, thank you all for your answers.
>>
>> @Magnus and Thad: it's a bit what I suspected. Since the URL to WorldCat
>> can be rebuilt from the Library of congress authority ID, I guess someone
>> thought it would be a duplicate.
>>
>> But 1) I'm not sure that there is a 1 to 1 mapping between all Worldcat
>> Identities and the Library of Congress 2) It would be rather strange that a
>> Library of Congress ID would also serve as an ID for a "competitor" (ie
>> OCLC, which maintains Worldcat and VIAF) 3) One would then wonder why
>> Wikipedia provides both links to the Library of Congress Authority ID and
>> Worldcat Identities.
>>
>> With respect for the fact that Wikidata already contains links to VIAF
>> and that VIAF contains links to Worldcat Identities, this transitivity
>> reasoning could apply to many other Authority IDs, I think.
>>
>> @Sebastian: Il would be great! I'll follow this project closely, just as
>> I'm already following your papers
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontent.iospress.com%2Farticles%2Fsemantic-web%2Fsw277=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=R1LeZSVSLmdZ%2FiJh6evIwrukAgtQQhzz7OxrJNL%2FXVg%3D=0>.
>> And it is precisely because I know that there is a desire for
>> "rapprochement" on both sides that I asked why there is absolutely nothing
>> in Wikidata that links to DBpedia (or Yago), whereas DBpedia contains a lot
>> of owl: sameAs to Wikidata. All this must have been discussed somewhere I
>> suppose. Still, I do not even find a property proposal for "DBpedia link".
>>
>> 2018-03-06 18:59 GMT+01:00 Sebastian Hellmann <
>> hellm...@informatik.uni-leipzig.de>:
>>
>>> Hi Ettore,
>>>
>>> we just released a very early prototype of the new DBpedia:
>>>
>>> http://88.99.242.78/hdt/en_wiki_de_sv_nl_fr-replaced.nt.bz2
>>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2F88.99.242.78%2Fhdt%2Fen_wiki_de_sv_nl_fr-replaced.nt.bz2=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=oml9kGY8ikCsVFH0jmSFx3txDjQGyT6lj%2FSxangnIZM%3D=0>
>>>
>>> I attached the first 1000 triples. The data is a merge of Wikidata + 5
>>> DBpedias from the 5 largest Wikipedia versions. Overall, there are many
>>> issues, but we have a test-driven data engineering process combined with
>>> Scrum and biweekly releases, next one is on March 15th. The new IDs are
>>> also stable by design.

Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Ettore RIZZA
>
> If Dbpedia has “same as” then Wikidata doesn’t have to duplicate that
> information you can ask dbpedia what is same as Q7724



Sebastian Hellman beats me to my question. Perhaps our points of view are
different. From mine, which is that of a data consumer and a network
enthusiast, the difference between a simple link and two-way links is huge.
I mean, links are cheap and linked data is just about links. Instead of a
heavy, complex and often empty federated SPAQL query, it would be enough to
ask Wikidata to get the information of DBpedia (I think Wikidata is
intended to contain one day all the DBpedia entities). Just as one will be
able to query Wikidata one day to know the VIAF ID of any writer. I believe
that Wikidata is destined to become a data hub of this kind, but maybe I'm
wrong.

Best regards,

Ettore Rizza

2018-03-06 20:11 GMT+01:00 Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de>:

> Hm, now I am also curious and would like to ask the same question as
> Ettore. What is the policy here?
>
> Viaf has schema.org backlinks, see https://viaf.org/viaf/85312226/rdf.xml
>
> http://www.wikidata.org/entity/Q80;
> <http://www.wikidata.org/entity/Q80>/>
> Then it's ok to duplicate? because it is not owl:sameAs?
>
> All the best,
> Sebastain
>
>
> On 06.03.2018 20:01, Magnus Sälgö wrote:
>
> If Dbpedia has “same as” then Wikidata doesn’t have to duplicate that
> information you can ask dbpedia what is same as Q7724
>
> Regards
> Magnus Sälgö
> Stockholm, Sweden
>
> 6 mars 2018 kl. 19:49 skrev Ettore RIZZA <ettoreri...@gmail.com>:
>
> First of all, thank you all for your answers.
>
> @Magnus and Thad: it's a bit what I suspected. Since the URL to WorldCat
> can be rebuilt from the Library of congress authority ID, I guess someone
> thought it would be a duplicate.
>
> But 1) I'm not sure that there is a 1 to 1 mapping between all Worldcat
> Identities and the Library of Congress 2) It would be rather strange that a
> Library of Congress ID would also serve as an ID for a "competitor" (ie
> OCLC, which maintains Worldcat and VIAF) 3) One would then wonder why
> Wikipedia provides both links to the Library of Congress Authority ID and
> Worldcat Identities.
>
> With respect for the fact that Wikidata already contains links to VIAF and
> that VIAF contains links to Worldcat Identities, this transitivity
> reasoning could apply to many other Authority IDs, I think.
>
> @Sebastian: Il would be great! I'll follow this project closely, just as
> I'm already following your papers
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontent.iospress.com%2Farticles%2Fsemantic-web%2Fsw277=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=R1LeZSVSLmdZ%2FiJh6evIwrukAgtQQhzz7OxrJNL%2FXVg%3D=0>.
> And it is precisely because I know that there is a desire for
> "rapprochement" on both sides that I asked why there is absolutely nothing
> in Wikidata that links to DBpedia (or Yago), whereas DBpedia contains a lot
> of owl: sameAs to Wikidata. All this must have been discussed somewhere I
> suppose. Still, I do not even find a property proposal for "DBpedia link".
>
> 2018-03-06 18:59 GMT+01:00 Sebastian Hellmann <hellm...@informatik.uni-
> leipzig.de>:
>
>> Hi Ettore,
>>
>> we just released a very early prototype of the new DBpedia:
>>
>> http://88.99.242.78/hdt/en_wiki_de_sv_nl_fr-replaced.nt.bz2
>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2F88.99.242.78%2Fhdt%2Fen_wiki_de_sv_nl_fr-replaced.nt.bz2=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=oml9kGY8ikCsVFH0jmSFx3txDjQGyT6lj%2FSxangnIZM%3D=0>
>>
>> I attached the first 1000 triples. The data is a merge of Wikidata + 5
>> DBpedias from the 5 largest Wikipedia versions. Overall, there are many
>> issues, but we have a test-driven data engineering process combined with
>> Scrum and biweekly releases, next one is on March 15th. The new IDs are
>> also stable by design.
>>
>> We discussed how to effectively reuse all technologies we have for
>> Wikidata and also Wikipedia and are applying with this project at the
>> moment: https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/Globa
>> lFactSync
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmeta.wikimedia.org%2Fwiki%2FGrants%3AProject%2FDBpedia%2FGlobalFactSync=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=cOIU%2FCe93h%2Ft4b1PdIDhMJWL%2FZB2XKnere%2BulFSZAFE%3D=0>
>> (Endorsements on the mai

Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Ettore RIZZA
First of all, thank you all for your answers.

@Magnus and Thad: it's a bit what I suspected. Since the URL to WorldCat
can be rebuilt from the Library of congress authority ID, I guess someone
thought it would be a duplicate.

But 1) I'm not sure that there is a 1 to 1 mapping between all Worldcat
Identities and the Library of Congress 2) It would be rather strange that a
Library of Congress ID would also serve as an ID for a "competitor" (ie
OCLC, which maintains Worldcat and VIAF) 3) One would then wonder why
Wikipedia provides both links to the Library of Congress Authority ID and
Worldcat Identities.

With respect for the fact that Wikidata already contains links to VIAF and
that VIAF contains links to Worldcat Identities, this transitivity
reasoning could apply to many other Authority IDs, I think.

@Sebastian: Il would be great! I'll follow this project closely, just as
I'm already following your papers
<https://content.iospress.com/articles/semantic-web/sw277>. And it is
precisely because I know that there is a desire for "rapprochement" on both
sides that I asked why there is absolutely nothing in Wikidata that links
to DBpedia (or Yago), whereas DBpedia contains a lot of owl: sameAs to
Wikidata. All this must have been discussed somewhere I suppose. Still, I
do not even find a property proposal for "DBpedia link".

2018-03-06 18:59 GMT+01:00 Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de>:

> Hi Ettore,
>
> we just released a very early prototype of the new DBpedia:
>
> http://88.99.242.78/hdt/en_wiki_de_sv_nl_fr-replaced.nt.bz2
>
> I attached the first 1000 triples. The data is a merge of Wikidata + 5
> DBpedias from the 5 largest Wikipedia versions. Overall, there are many
> issues, but we have a test-driven data engineering process combined with
> Scrum and biweekly releases, next one is on March 15th. The new IDs are
> also stable by design.
>
> We discussed how to effectively reuse all technologies we have for
> Wikidata and also Wikipedia and are applying with this project at the
> moment: https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/
> GlobalFactSync
> (Endorsements on the main page and comments on the talk page are welcome).
>
> We really hope that the project gets accepted, so we can deploy the
> technologies behind DBpedia to the Wikiverse, e.g. we found over 900k
> triples/statements with references in the English Wikipedia's Infoboxes
> alone.
>
> We still have to do documentation and hosting of the new releases, but
> then it would indeed be a good time to add the links to DBpedia, if nobody
> objects. Also some people mentioned that we could load the DBpedia Ontology
> into Wikidata to provide an alternate class hierarchy. In DBpedia we loaded
> 5 or 6 classification schemes (Yago, Umbel, etc.), which are useful for
> different kind of queries.
>
>
> All the best,
> Sebastian
>
>
>
>
> On 06.03.2018 18:14, Ettore RIZZA wrote:
>
> Dear all,
>
> I asked myself a series of questions about the links between Wikidata and
> other knowledge/data bases, namely those of OCLC and DBpedia. For example:
>
> - Why Wikidata has no property "Worldcat Identities
> <http://worldcat.org/identities/>" while the English edition of Wikipedia
> systematically mentions this identity (when it exists) in its section
> "Autorithy control"  ?
>
> - Why do VIAF links to all editions of Wikipedia, but not (simply) to
> Wikidata ?
>
> - Why is there no link to DBpedia when the opposite is true ?
>
> These questions may seem very different from each other, but they
> ultimately concern a common subject and are all very basic. I suspect they
> had to be discussed somewhere, maybe at the dawn of Wikidata. However, I
> find nothing in the archives of this Newsletter, nor in the discussions on
> Wikidata.
>
> Could someone point me to some documentation on these issues ?
>
> Cheers,
>
> Ettore Rizza
>
>
> ___
> Wikidata mailing 
> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> --
> All the best,
> Sebastian Hellmann
>
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Ettore RIZZA
Dear all,

I asked myself a series of questions about the links between Wikidata and
other knowledge/data bases, namely those of OCLC and DBpedia. For example:

- Why Wikidata has no property "Worldcat Identities
<http://worldcat.org/identities/>" while the English edition of Wikipedia
systematically mentions this identity (when it exists) in its section
"Autorithy control"  ?

- Why do VIAF links to all editions of Wikipedia, but not (simply) to
Wikidata ?

- Why is there no link to DBpedia when the opposite is true ?

These questions may seem very different from each other, but they
ultimately concern a common subject and are all very basic. I suspect they
had to be discussed somewhere, maybe at the dawn of Wikidata. However, I
find nothing in the archives of this Newsletter, nor in the discussions on
Wikidata.

Could someone point me to some documentation on these issues ?

Cheers,

Ettore Rizza
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] GlobalFactSync

2018-02-05 Thread Ettore RIZZA
Hi,

Our plan here is to map all Wikidata properties to the DBpedia Ontology and
> then have the info to compare coverage of Wikidata with all infoboxes
> across languages.


This is a really exciting project that would improve both Wikidata and
DBpedia. I would be interested to know more, especially on what has already
been done in terms of mapping and what remains to do.

I see, for example, that DBpedia has a list of missing properties and
classes
<https://wizyapp.appspot.com/k/u/e/5c6a68564e46200b83b303c7e7c5f246?a=aHR0cDovL21hcHBpbmdzLmRicGVkaWEub3JnL3NlcnZlci9vbnRvbG9neS93aWtpZGF0YS9taXNzaW5nLw%3D%3D>,
but I don't know if it's up to date.

Best regards,

Ettore Rizza

2018-01-15 19:57 GMT+01:00 Magnus Knuth <kn...@informatik.uni-leipzig.de>:

> Dear all,
>
> last year, we applied for a Wikimedia grant to feed qualified data from
> Wikipedia infoboxes (i.e. missing statements with references) via the
> DBpedia software into Wikidata. The evaluation was already quite good, but
> some parts were still missing and we would like to ask for your help and
> feedback for the next round. The new application is here:
> https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync
>
> The main purpose of the grant is:
>
> - Wikipedia infoboxes are quite rich, are manually curated and have
> references. DBpedia is already extracting that data quite well (i.e. there
> is no other software that does it better). However, extracting references
> is not a priority on our agenda. They would be very useful to Wikidata, but
> there are no user requests for this from DBpedia users.
>
> - DBpedia also has all the infos of all infoboxes of all Wikipedia
> editions (>10k pages), so we also know quite well, where Wikidata is used
> already and where information is available in Wikidata or one language
> version and missing in another.
>
> - side-goal: bring the Wikidata, Wikipedia and DBpedia communities closer
> together
>
> Here is a diff between the old an new proposal:
>
> - extraction of infobox references will still be a goal of the reworked
> proposal
>
> - we have been working on the fusion and data comparison engine (the part
> of the budget that came from us) for a while now and there are first
> results:
>
> 6823 birthDate_gain_wiki.nt
> 3549 deathDate_gain_wiki.nt
>   362541 populationTotal_gain_wiki.nt
>   372913 total
>
> We only took three properties for now and showed the gain where no
> Wikidata statement was available. birthDate/deathDate is already quite
> good. Details here: https://drive.google.com/file/
> d/1j5GojhzFJxLYTXerLJYz3Ih-K6UtpnG_/view?usp=sharing
>
> Our plan here is to map all Wikidata properties to the DBpedia Ontology
> and then have the info to compare coverage of Wikidata with all infoboxes
> across languages.
>
> - we will remove the text extraction part from the old proposal (which is
> here for you reference: https://meta.wikimedia.org/
> wiki/Grants:Project/DBpedia/CrossWikiFact). This will still be a focus
> during our work in 2018, together with Diffbot and the new DBpedia NLP
> department, but we think that it distracted from the core of the proposal.
> Results from the Wikipedia article text extraction can be added later once
> they are available and discussed separately.
>
> - We proposed to make an extra website that helps to synchronize all
> Wikipedias and Wikidata with DBpedia as its backend. While the external
> website is not an ideal solution, we are lacking alternatives. The Primary
> Sources Tool is mainly for importing data into Wikidata, not so much
> synchronization. The MediaWiki instances of the Wikipedias do not seem to
> have any good interfaces to provide suggestions and pinpoint missing info.
> Especially to this part, we would like to ask for your help and
> suggestions, either per mail to the list or on the talk page:
> https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync
>
> We are looking forward to a fruitful collaboration with you and we thank
> you for your feedback!
>
> All the best
> Magnus
>
> --
> Magnus Knuth
>
> Universität Leipzig
> Institut für Informatik
> Abt. Betriebliche Informationssysteme, AKSW/KILT
> Augustusplatz 10
> 04109 Leipzig DE
>
> mail: kn...@informatik.uni-leipzig.de
> tel: +49 177 3277537
> webID: http://magnus.13mm.de/
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-11-03 Thread Ettore RIZZA
Thank you very much, Jasper !

2017-11-03 10:15 GMT+01:00 Jasper Koehorst <jasperkoeho...@gmail.com>:

> I am uploading the index file temporarily to:
>
> http://fungen.wur.nl/~jasperk/WikiData/
>
> Jasper
>
>
> On 3 Nov 2017, at 10:05, Ettore RIZZA <ettoreri...@gmail.com> wrote:
>
> Thank you for this feedback, Laura.
>
> Is the hdt index you got available somewhere on the cloud?
>
> Cheers
>
> 2017-11-03 9:56 GMT+01:00 Osma Suominen <osma.suomi...@helsinki.fi>:
>
>> Hi Laura,
>>
>> Thank you for sharing your experience! I think your example really shows
>> the power - and limitations - of HDT technology for querying very large RDF
>> data sets. While I don't currently have any use case for a local, queryable
>> Wikidata dump, I can easily see that it could be very useful for doing e.g.
>> resource-intensive, analytic queries. Having access to a recent hdt+index
>> dump of Wikidata would make it very easy to start doing that. So I second
>> your plea.
>>
>> -Osma
>>
>>
>> Laura Morales kirjoitti 03.11.2017 klo 09:48:
>>
>>> Hello list,
>>>
>>> a very kind person from this list has generated the .hdt.index file for
>>> me, using the 1-year old wikidata HDT file available at the rdfhdt website.
>>> So I was finally able to setup a working local endpoint using HDT+Fuseki.
>>> Set up was easy, launch time (for Fuseki) also was quick (a few seconds),
>>> the only change I made was to replace -Xmx1024m to -Xmx4g in the Fuseki
>>> startup script (btw I'm not very proficient in Java, so I hope this is the
>>> correct way). I've ran some queries too. Simple select or traversal queries
>>> seems fast to me (I haven't measured them but the response is almost
>>> immediate), other queries such as "select distinct ?class where { [] a
>>> ?class }" takes several seconds or a few minutes to complete, which kinda
>>> tells me the HDT indexes don't work well on all queries. But otherwise for
>>> simple queries it works perfectly! At least I'm able to query the dataset!
>>> In conclusion, I think this more or less gives some positive feedback
>>> for using HDT on a "commodity computer", which means it can be very useful
>>> for people like me who want to use the dataset locally but who can't setup
>>> a full-blown server. If others want to try as well, they can offer more
>>> (hopefully positive) feedback.
>>> For all of this, I heartwarmingly plea any wikidata dev to please
>>> consider scheduling a HDT dump (.hdt + .hdt.index) along with the other
>>> regular dumps that it creates weekly.
>>>
>>> Thank you!!
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finlan
>> <https://maps.google.com/?q=y+of+Finlan=gmail=g>d
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-11-03 Thread Ettore RIZZA
Thank you for this feedback, Laura.

Is the hdt index you got available somewhere on the cloud?

Cheers

2017-11-03 9:56 GMT+01:00 Osma Suominen :

> Hi Laura,
>
> Thank you for sharing your experience! I think your example really shows
> the power - and limitations - of HDT technology for querying very large RDF
> data sets. While I don't currently have any use case for a local, queryable
> Wikidata dump, I can easily see that it could be very useful for doing e.g.
> resource-intensive, analytic queries. Having access to a recent hdt+index
> dump of Wikidata would make it very easy to start doing that. So I second
> your plea.
>
> -Osma
>
>
> Laura Morales kirjoitti 03.11.2017 klo 09:48:
>
>> Hello list,
>>
>> a very kind person from this list has generated the .hdt.index file for
>> me, using the 1-year old wikidata HDT file available at the rdfhdt website.
>> So I was finally able to setup a working local endpoint using HDT+Fuseki.
>> Set up was easy, launch time (for Fuseki) also was quick (a few seconds),
>> the only change I made was to replace -Xmx1024m to -Xmx4g in the Fuseki
>> startup script (btw I'm not very proficient in Java, so I hope this is the
>> correct way). I've ran some queries too. Simple select or traversal queries
>> seems fast to me (I haven't measured them but the response is almost
>> immediate), other queries such as "select distinct ?class where { [] a
>> ?class }" takes several seconds or a few minutes to complete, which kinda
>> tells me the HDT indexes don't work well on all queries. But otherwise for
>> simple queries it works perfectly! At least I'm able to query the dataset!
>> In conclusion, I think this more or less gives some positive feedback for
>> using HDT on a "commodity computer", which means it can be very useful for
>> people like me who want to use the dataset locally but who can't setup a
>> full-blown server. If others want to try as well, they can offer more
>> (hopefully positive) feedback.
>> For all of this, I heartwarmingly plea any wikidata dev to please
>> consider scheduling a HDT dump (.hdt + .hdt.index) along with the other
>> regular dumps that it creates weekly.
>>
>> Thank you!!
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finlan
> d
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Ettore RIZZA
While I'm on the subject, I would like to draw attention to the Neckar
project <http://event.ifi.uni-heidelberg.de/?page_id=532>, which aims
precisely to classify Wikidata entities in people, places and
organizations. Frequently updated Json dumps are available.

2017-10-16 16:08 GMT+02:00 Ettore RIZZA <ettoreri...@gmail.com>:

> @Antonin : Thanks for this counting method, it seems very effective (I
> already knew that there were 3.6 M of humans (Q5) in Wikidata).
>
> https://query.wikidata.org/#%23compter%20le%20nombre%20d%
> 27%C3%A9l%C3%A9ments%20appartenant%20%C3%A0%20la%20cat%C3%A9gorie%0A%
> 23organisation%20ou%20%C3%A0%20ses%20enfants%0ASELECT%
> 20DISTINCT%20%28COUNT%28DISTINCT%20%3Fitem%29%20AS%
> 20%3Fcount%29%20WHERE%20%7B%20%3Fitem%20%28wdt%3AP31%
> 2Fwdt%3AP279%2a%29%20wd%3AQ5.%20%7D
>
> 2017-10-16 15:34 GMT+02:00 Antonin Delpeuch (lists) <
> li...@antonin.delpeuch.eu>:
>
>> And… my own count was wrong too, because I forgot to add DISTINCT in my
>> query (if there are multiple paths from the class to "organization
>> (Q43229)", items will appear multiple times).
>>
>> So, I get 1 168 084 now.
>> http://tinyurl.com/yaeqlsnl
>>
>> It's easy to get these things wrong!
>>
>> Antonin
>>
>> On 16/10/2017 14:16, Antonin Delpeuch (lists) wrote:
>> > Thanks Ettore for spotting that!
>> >
>> > Wikidata types (P31) only make sense when you consider the "subclass of"
>> > (P279) property that we use to build the ontology (except in a few cases
>> > where the community has decided not to use any subclass for a particular
>> > type).
>> >
>> > So, to retrieve all items of a certain type in SPARQL, you need to use
>> > something like this:
>> >
>> > ?item wdt:P31/wdt:P279* ?type
>> >
>> > You can also have other variants to accept non-truthy statements.
>> >
>> > Just with this truthy version, I currently get 1 208 227 items. But note
>> > that there are still a lot of items where P31 is not provided, or
>> > subclasses which have not been connected to "organization (Q43229)"…
>> >
>> > So in general, it's very hard to have any "guarantees that there are no
>> > duplicates", just because you don't have any guarantees that the
>> > information currently in Wikidata is complete or correct.
>> >
>> > I would recommend trying to import something a bit smaller to get
>> > acquainted with how Wikidata works and what the matching process looks
>> > like in practice. And beyond a one-off import, as Ettore said it is
>> > important to think how the data will be maintained in the future…
>> >
>> > Antonin
>> >
>> > On 16/10/2017 13:46, Ettore RIZZA wrote:
>> >> - Wikidata has 40k organisations:
>> >>
>> >> https://query.wikidata.org/#SELECT
>> >> <https://query.wikidata.org/#SELECT> %3Fitem %3FitemLabel %0AWHERE
>> >> %0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel
>> {
>> >> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think Wikidata contains many more organizations than that. If we
>> >> choose the "instance of Business enterprise", we get 135570 results.
>> And
>> >> I imagine there are many other categories that bring together
>> commercial
>> >> companies.
>> >>
>> >>
>> >> https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%
>> 20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ4830453.%
>> 0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AservicePa
>> ram%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D
>> >>
>> >> On the substance, the project to add all companies of a country would
>> >> make Wikidata a kind of totally free clone of Open Corporates
>> >> <https://opencorporates.com/>. I would of course be delighted to see
>> >> that, but is it not a challenge to maintain such a database? Companies
>> >> are like humans, it appears and disappears every day.
>> >>
>> >>
>> >>
>> >> 2017-10-16 13:41 GMT+02:00 Sebastian Hellmann
>> >> <hellm...@informatik.uni-leipzig.de
>> >> <mailto:hellm...@informatik.uni-leipzig.de>>:
>> >>
>> >> Hi all,
>> >>
>> >> the technical challenges ar

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Ettore RIZZA
@Antonin : Thanks for this counting method, it seems very effective (I
already knew that there were 3.6 M of humans (Q5) in Wikidata).

https://query.wikidata.org/#%23compter%20le%20nombre%20d%27%C3%A9l%C3%A9ments%20appartenant%20%C3%A0%20la%20cat%C3%A9gorie%0A%23organisation%20ou%20%C3%A0%20ses%20enfants%0ASELECT%20DISTINCT%20%28COUNT%28DISTINCT%20%3Fitem%29%20AS%20%3Fcount%29%20WHERE%20%7B%20%3Fitem%20%28wdt%3AP31%2Fwdt%3AP279%2a%29%20wd%3AQ5.%20%7D

2017-10-16 15:34 GMT+02:00 Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu>:

> And… my own count was wrong too, because I forgot to add DISTINCT in my
> query (if there are multiple paths from the class to "organization
> (Q43229)", items will appear multiple times).
>
> So, I get 1 168 084 now.
> http://tinyurl.com/yaeqlsnl
>
> It's easy to get these things wrong!
>
> Antonin
>
> On 16/10/2017 14:16, Antonin Delpeuch (lists) wrote:
> > Thanks Ettore for spotting that!
> >
> > Wikidata types (P31) only make sense when you consider the "subclass of"
> > (P279) property that we use to build the ontology (except in a few cases
> > where the community has decided not to use any subclass for a particular
> > type).
> >
> > So, to retrieve all items of a certain type in SPARQL, you need to use
> > something like this:
> >
> > ?item wdt:P31/wdt:P279* ?type
> >
> > You can also have other variants to accept non-truthy statements.
> >
> > Just with this truthy version, I currently get 1 208 227 items. But note
> > that there are still a lot of items where P31 is not provided, or
> > subclasses which have not been connected to "organization (Q43229)"…
> >
> > So in general, it's very hard to have any "guarantees that there are no
> > duplicates", just because you don't have any guarantees that the
> > information currently in Wikidata is complete or correct.
> >
> > I would recommend trying to import something a bit smaller to get
> > acquainted with how Wikidata works and what the matching process looks
> > like in practice. And beyond a one-off import, as Ettore said it is
> > important to think how the data will be maintained in the future…
> >
> > Antonin
> >
> > On 16/10/2017 13:46, Ettore RIZZA wrote:
> >> - Wikidata has 40k organisations:
> >>
> >> https://query.wikidata.org/#SELECT
> >> <https://query.wikidata.org/#SELECT> %3Fitem %3FitemLabel %0AWHERE
> >> %0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> >> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
> >>
> >>
> >> Hi,
> >>
> >> I think Wikidata contains many more organizations than that. If we
> >> choose the "instance of Business enterprise", we get 135570 results. And
> >> I imagine there are many other categories that bring together commercial
> >> companies.
> >>
> >>
> >> https://query.wikidata.org/#SELECT%20%3Fitem%20%
> 3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%
> 3AQ4830453.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%
> 3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_
> LANGUAGE%5D%2Cen%22.%20%7D%0A%7D
> >>
> >> On the substance, the project to add all companies of a country would
> >> make Wikidata a kind of totally free clone of Open Corporates
> >> <https://opencorporates.com/>. I would of course be delighted to see
> >> that, but is it not a challenge to maintain such a database? Companies
> >> are like humans, it appears and disappears every day.
> >>
> >>
> >>
> >> 2017-10-16 13:41 GMT+02:00 Sebastian Hellmann
> >> <hellm...@informatik.uni-leipzig.de
> >> <mailto:hellm...@informatik.uni-leipzig.de>>:
> >>
> >> Hi all,
> >>
> >> the technical challenges are not so difficult.
> >>
> >> - 2.2 million are the exact number of German organisations, i.e.
> >> associations and companies. They are also unique.
> >>
> >> - Wikidata has 40k organisations:
> >>
> >> https://query.wikidata.org/#SELECT
> >> <https://query.wikidata.org/#SELECT> %3Fitem %3FitemLabel %0AWHERE
> >> %0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> >> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
> >>
> >> so there would be a maximum of 40k duplicates These are easy to find
> >> and deduplicate
> >>
> >

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Ettore RIZZA
>
> - Wikidata has 40k organisations:

https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A
> %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}


Hi,

I think Wikidata contains many more organizations than that. If we choose
the "instance of Business enterprise", we get 135570 results. And I imagine
there are many other categories that bring together commercial companies.


https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ4830453.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

On the substance, the project to add all companies of a country would make
Wikidata a kind of totally free clone of Open Corporates
. I would of course be delighted to see that,
but is it not a challenge to maintain such a database? Companies are like
humans, it appears and disappears every day.



2017-10-16 13:41 GMT+02:00 Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de>:

> Hi all,
>
> the technical challenges are not so difficult.
>
> - 2.2 million are the exact number of German organisations, i.e.
> associations and companies. They are also unique.
>
> - Wikidata has 40k organisations:
>
> https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A
> %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
>
> so there would be a maximum of 40k duplicates These are easy to find and
> deduplicate
>
> - The crawl can be done easily, a colleague has done so before.
>
>
> The issues here are:
>
> - Do you want to upload the data in Wikidata? It would be a real big
> extension. Can I go ahead
>
> - If the data were available externally as structured data under open
> license, I would probably not suggest loading it into wikidata, as the data
> can be retrieved from the official source directly, however, here this data
> will not be published in a decent format.
>
> I thought that the way data is copied from coyrighted sources, i.e. only
> facts is ok for wikidata. This done in a lot of places, I guess. Same for
> Wikipedia, i.e. News articles and copyrighted books are referenced. So
> Wikimedia or the Wikimedia community are experts on this.
>
> All the best,
>
> Sebastian
>
> On 16.10.2017 10:18, Neubert, Joachim wrote:
>
> Hi Sebastian,
>
>
>
> This is huge! It will cover almost all currently existing German
> companies. Many of these will have similar names, so preparing for
> disambiguation is a concern.
>
>
>
> A good way for such an approach would be proposing a property for an
> external identifier, loading the data into Mix-n-match, creating links for
> companies already in Wikidata, and adding the rest (or perhaps only parts
> of them - I’m not sure if having all of them in Wikidata makes sense, but
> that’s another discussion), preferably with location and/or sector of trade
> in the description field.
>
>
>
> I’ve tried to figure out what could be used as key for a external
> identifier property. However, it looks like the registry does not offer any
> (persistent) URL to its entries. So for looking up a company, apparently
> there are two options:
>
>
>
> -  conducting an extended search for the exact string “A
> Dienstleistungsgesellschaft mbH“
>
> -  copying the register number “32853” plus selecting the court
> (Leipzig) from the according dropdown list and search that
>
>
>
> Both ways are not very intuitive, even if we can provide a link to the
> search form. This would make a weak connection to the source of
> information. Much more important, it makes disambiguation in Mix-n-match
> difficult. This applies for the preparation of your initial load (you would
> not want to create duplicates). But much more so for everybody else who
> wants to match his or her data later on. Being forced to search for entries
> manually in a cumbersome way for disambiguation of a new, possibly large
> and rich dataset is, in my eyes, not something we want to impose on future
> contributors. And often, the free information they find in the registry
> (formal name, register number, legal form, address) will not easily match
> with the information they have (common name, location, perhaps founding
> date, and most important sector of trade), so disambiguation may still be
> difficult.
>
>
>
> Have you checked which parts of the accessible information as below can be
> crawled and added legally to external databases such as Wikidata?
>
>
>
> Cheers, Joachim
>
>
>
> --
>
> Joachim Neubert
>
>
>
> ZBW – German National Library of Economics
>
> Leibniz Information Centre for Economics
>
> Neuer Jungfernstieg 21
> 20354 Hamburg
>
> Phone +49-42834-462
>
>
>
>
>
>
>
> *Von:* Wikidata [mailto:wikidata-boun...@lists.wikimedia.org
> 

Re: [Wikidata] Linking to place with Wikipedia page but no Wikidata link

2017-09-07 Thread Ettore RIZZA
The Springer paywall is no longer a problem for open science since there is
a certain Russian website, but in this case I see that we can find the full
article on ResearchGate:
https://www.researchgate.net/profile/Alessandro_Piscopo/publication/319272942_What_makes_a_good_collaborative_knowledge_graph_Group_composition_and_quality_in_Wikidata/links/599fd3d2a6fdccf594266835/What-makes-a-good-collaborative-knowledge-graph-Group-composition-and-quality-in-Wikidata
.pdf <about:invalid#zClosurez>

2017-09-07 8:46 GMT+02:00 Gerard Meijssen <gerard.meijs...@gmail.com>:

> Hoi,
> Sorry but with only conclusions it is just that.. hidden behind a paywall.
> Consequently it does not make a difference; our community cannot comment.
> Please choose a different venue for publications.
> Thanks,
>  GerardM
>
> On 7 September 2017 at 08:37, Ettore RIZZA <ettoreri...@gmail.com> wrote:
>
>> Well, here is a fresh paper that seems to have been written to answer the
>> questions I had after this discussion.
>>
>> " We performed a regression analysis to investigate how the contribution
>> of different types of users, i.e. bots and human editors, registered or
>> anonymous, influences outcome quality in Wikidata. Moreover, we looked at
>> the effects of tenure and interest diversity among registered users. Our
>> findings show that a balanced contribution of bots and human editors
>> positively influence outcome quality, whereas higher numbers of anonymous
>> edits may hinder performance. Tenure and interest diversity within groups
>> also lead to higher quality. "
>>
>> https://link.springer.com/chapter/10.1007/978-3-319-67217-5_19
>>
>> 2017-09-02 15:53 GMT+02:00 Jane Darnell <jane...@gmail.com>:
>>
>>> Thanks! That really made me laugh and I needed that. The wonderful story
>>> of Wikidata's history set within the wonderful story of Wikipedia's history
>>> anno 2014 is truly amazing. Using that information to describe Wikidata
>>> today is like trying to imagine the "bot wars" that have recently become a
>>> viral hit on various social media websites. You could say Wikidata was born
>>> out of a need to end "bot wars" between updating interwikilink bots. After
>>> that "bot war" ended though, it looks like we created a new "bot war" where
>>> Wikipedians became afraid of this new project because they might get bitten
>>> by a bot.
>>> https://blog.wikimedia.org/2017/08/30/wikipedia-bot-pocalypse/
>>>
>>> On Sat, Sep 2, 2017 at 3:05 PM, Ettore RIZZA <ettoreri...@gmail.com>
>>> wrote:
>>>
>>>> Hi Jane,
>>>>
>>>> I'm really sorry if my naïve comment made you sad. :/ To be clearer, I
>>>> never wanted to minimize the contribution of the volunteers! It's just that
>>>> I still don't know the internal mechanics of Wikidata. I recently read in a
>>>> paper, already a bit old*, that 90% of editions were made by bots. I just
>>>> thought that the mapping between the Wikipedia editions and Wikidata was
>>>> part of these 90% automated tasks, after which the volunteers had to add
>>>> the missing 10%, correct and enrich the automatic operations, etc. I'm
>>>> sorry if I misunderstood.
>>>>
>>>> ** " Wikidata has grown significantly since its launch in October
>>>> 2012; see the table here for key facts about its current content. It has
>>>> also become the most edited Wikimedia project, with 150– 500 edits per
>>>> minute, or a half million per day, about three times as many as the English
>>>> Wikipedia. Approximately 90% of these edits are made by bots contributors
>>>> create for automating tasks, yet almost one million edits per month are
>>>> still made by humans."* (VRANDEČIĆ, Denny et KRÖTZSCH, Markus.
>>>> Wikidata: a free collaborative knowledgebase. *Communications of the
>>>> ACM*, 2014, vol. 57, no 10, p. 78-85.)
>>>>
>>>> 2017-09-02 14:42 GMT+02:00 Ed Summers <e...@pobox.com>:
>>>>
>>>>>
>>>>> > On Sep 2, 2017, at 7:47 AM, Jane Darnell <jane...@gmail.com> wrote:
>>>>> >
>>>>> > Your note really made me feel so sad. I try to motivate my
>>>>> Wikipedian friends into doing more on Wikidata and each time they react 
>>>>> the
>>>>> way you did, with a sentence like "I imagined that the mapping between
>>>>> Wikipedia and Wikidata was ultra-automated." I guess there is s

Re: [Wikidata] Linking to place with Wikipedia page but no Wikidata link

2017-09-07 Thread Ettore RIZZA
Well, here is a fresh paper that seems to have been written to answer the
questions I had after this discussion.

" We performed a regression analysis to investigate how the contribution of
different types of users, i.e. bots and human editors, registered or
anonymous, influences outcome quality in Wikidata. Moreover, we looked at
the effects of tenure and interest diversity among registered users. Our
findings show that a balanced contribution of bots and human editors
positively influence outcome quality, whereas higher numbers of anonymous
edits may hinder performance. Tenure and interest diversity within groups
also lead to higher quality. "

https://link.springer.com/chapter/10.1007/978-3-319-67217-5_19

2017-09-02 15:53 GMT+02:00 Jane Darnell <jane...@gmail.com>:

> Thanks! That really made me laugh and I needed that. The wonderful story
> of Wikidata's history set within the wonderful story of Wikipedia's history
> anno 2014 is truly amazing. Using that information to describe Wikidata
> today is like trying to imagine the "bot wars" that have recently become a
> viral hit on various social media websites. You could say Wikidata was born
> out of a need to end "bot wars" between updating interwikilink bots. After
> that "bot war" ended though, it looks like we created a new "bot war" where
> Wikipedians became afraid of this new project because they might get bitten
> by a bot.
> https://blog.wikimedia.org/2017/08/30/wikipedia-bot-pocalypse/
>
> On Sat, Sep 2, 2017 at 3:05 PM, Ettore RIZZA <ettoreri...@gmail.com>
> wrote:
>
>> Hi Jane,
>>
>> I'm really sorry if my naïve comment made you sad. :/ To be clearer, I
>> never wanted to minimize the contribution of the volunteers! It's just that
>> I still don't know the internal mechanics of Wikidata. I recently read in a
>> paper, already a bit old*, that 90% of editions were made by bots. I just
>> thought that the mapping between the Wikipedia editions and Wikidata was
>> part of these 90% automated tasks, after which the volunteers had to add
>> the missing 10%, correct and enrich the automatic operations, etc. I'm
>> sorry if I misunderstood.
>>
>> ** " Wikidata has grown significantly since its launch in October 2012;
>> see the table here for key facts about its current content. It has also
>> become the most edited Wikimedia project, with 150– 500 edits per minute,
>> or a half million per day, about three times as many as the English
>> Wikipedia. Approximately 90% of these edits are made by bots contributors
>> create for automating tasks, yet almost one million edits per month are
>> still made by humans."* (VRANDEČIĆ, Denny et KRÖTZSCH, Markus. Wikidata:
>> a free collaborative knowledgebase. *Communications of the ACM*, 2014,
>> vol. 57, no 10, p. 78-85.)
>>
>> 2017-09-02 14:42 GMT+02:00 Ed Summers <e...@pobox.com>:
>>
>>>
>>> > On Sep 2, 2017, at 7:47 AM, Jane Darnell <jane...@gmail.com> wrote:
>>> >
>>> > Your note really made me feel so sad. I try to motivate my Wikipedian
>>> friends into doing more on Wikidata and each time they react the way you
>>> did, with a sentence like "I imagined that the mapping between Wikipedia
>>> and Wikidata was ultra-automated." I guess there is something about the
>>> "data" word in the same that makes people assume it is technical, or that
>>> being "machine-readable" makes it impossible for humans to read and without
>>> "bot" knowlege, there is no place for "normal contributors" to help out.
>>>
>>> I appreciate this perspective a great deal. I think it's great that you
>>> are motivating users to edit Wikidata--it's really important. Wikidata is
>>> nothing (IMHO) without the human-in-the-loop.
>>>
>>> But as a practical matter wouldn't it be useful if there were stubs in
>>> Wikidata that would help editors identify which entities need attention? Or
>>> would the vastness of it cause a problem?
>>>
>>> I can certainly see an argument for an embargo period to give
>>> counter-vandalism efforts a chance to triage the new pages. But after that
>>> point wouldn't it be useful if a bot monitored the language wikipedias for
>>> new entries and then added them to Wikidata so that people could fill them
>>> out?
>>>
>>> I'm just throwing ideas around here, and am not trying to be critical of
>>> the current state of affairs. You all are doing amazing work.
>>>
>>> //Ed
>>>
>>> __

Re: [Wikidata] Linking to place with Wikipedia page but no Wikidata link

2017-09-02 Thread Ettore RIZZA
Hi Jane,

I'm really sorry if my naïve comment made you sad. :/ To be clearer, I
never wanted to minimize the contribution of the volunteers! It's just that
I still don't know the internal mechanics of Wikidata. I recently read in a
paper, already a bit old*, that 90% of editions were made by bots. I just
thought that the mapping between the Wikipedia editions and Wikidata was
part of these 90% automated tasks, after which the volunteers had to add
the missing 10%, correct and enrich the automatic operations, etc. I'm
sorry if I misunderstood.

** " Wikidata has grown significantly since its launch in October 2012; see
the table here for key facts about its current content. It has also become
the most edited Wikimedia project, with 150– 500 edits per minute, or a
half million per day, about three times as many as the English Wikipedia.
Approximately 90% of these edits are made by bots contributors create for
automating tasks, yet almost one million edits per month are still made by
humans."* (VRANDEČIĆ, Denny et KRÖTZSCH, Markus. Wikidata: a free
collaborative knowledgebase. *Communications of the ACM*, 2014, vol. 57, no
10, p. 78-85.)

2017-09-02 14:42 GMT+02:00 Ed Summers :

>
> > On Sep 2, 2017, at 7:47 AM, Jane Darnell  wrote:
> >
> > Your note really made me feel so sad. I try to motivate my Wikipedian
> friends into doing more on Wikidata and each time they react the way you
> did, with a sentence like "I imagined that the mapping between Wikipedia
> and Wikidata was ultra-automated." I guess there is something about the
> "data" word in the same that makes people assume it is technical, or that
> being "machine-readable" makes it impossible for humans to read and without
> "bot" knowlege, there is no place for "normal contributors" to help out.
>
> I appreciate this perspective a great deal. I think it's great that you
> are motivating users to edit Wikidata--it's really important. Wikidata is
> nothing (IMHO) without the human-in-the-loop.
>
> But as a practical matter wouldn't it be useful if there were stubs in
> Wikidata that would help editors identify which entities need attention? Or
> would the vastness of it cause a problem?
>
> I can certainly see an argument for an embargo period to give
> counter-vandalism efforts a chance to triage the new pages. But after that
> point wouldn't it be useful if a bot monitored the language wikipedias for
> new entries and then added them to Wikidata so that people could fill them
> out?
>
> I'm just throwing ideas around here, and am not trying to be critical of
> the current state of affairs. You all are doing amazing work.
>
> //Ed
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Linking to place with Wikipedia page but no Wikidata link

2017-09-01 Thread Ettore RIZZA
Thank you for your answer, Jane. I had not thought about the fact that some
professions could be better represented than others. I imagined that the
mapping between Wikipedia and Wikidata was ultra-automated. It's very
interesting.

2017-09-01 19:34 GMT+02:00 Osma Suominen :

> Thank you Jane and everyone else for your speedy responses. Postponing the
> creation of Wikidata entities for newly created Wikipedia articles that may
> turn out to be short-lived makes total sense. So we will simply create the
> corresponding Wikidata entities manually in cases like this.
>
> -Osma
>
>
> Jane Darnell kirjoitti 01.09.2017 klo 16:36:
>
>> Checking the history of that page shows it was recently created. Not sure
>> how the Finns do this but like the Dutch they probably have a bot that
>> creates Wikidata items after a month or so has passed (this avoids creating
>> items for things that get deleted through the "speedy delete" process). You
>> can create the item yourself, or wait another month I guess.
>> https://fi.wikipedia.org/w/index.php?title=Teuro=history
>>
>> On Fri, Sep 1, 2017 at 3:32 PM, Osma Suominen > > wrote:
>>
>> Hi,
>>
>> This may be a total newbie question, sorry about that!
>>
>> While linking YSO places to Wikidata we have stumbled on a few cases
>> where there is a Wikipedia article about the place we want to link,
>> but that page has no Wikidata link visible. And it seems that
>> Wikidata itself does not contain that entity.
>>
>> An example is the village Teuro in Tammela, Finland. It has a page
>> on the Finnish Wikipedia:
>> https://fi.wikipedia.org/wiki/Teuro
>> 
>>
>> But that page has no Wikidata link. A search for "Teuro" in Wikidata
>> gives a few hits, but none of them represent the village.
>>
>> What's the correct way to correct this? I found this guide:
>> https://www.wikidata.org/wiki/Help:Linking_Wikipedia_pages
>> 
>>
>> But I'm not 100% it addresses this exact situation. How did this
>> happen in the first place? My naïve understanding was that every
>> normal article in Wikipedia would have a corresponding Wikidata
>> entity, but apparently that's not entirely true!
>>
>> -Osma
>>
>>
>> -- Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529 
>> osma.suomi...@helsinki.fi 
>> http://www.nationallibrary.fi
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>>
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Linking to place with Wikipedia page but no Wikidata link

2017-09-01 Thread Ettore RIZZA
Hi all,

I am glad that Osma Suominen asked this "newbie" question, since I was
wondering about the same. I'm just looking for compared statistics between
Wikidata, the different editions of Wikipedia and those of DBpedia.

The question I am trying to solve is simply: what is the probability that a
place or a person name that is not mentioned in Wikidata can be found
somewhere in Wikipedia or in DBpedia? If anyone has any elements to answer
that question, it would be appreciated.
Ettore Rizza


2017-09-01 17:03 GMT+02:00 Yaroslav Blanter <ymb...@gmail.com>:

> It is not a language, this depends on the bot owners. I would not be
> surprised if there projects they never visit.
>
> I am a Russian Wikivoyage admin, and we make sure all newly created items
> have a Wikidata link, but I think if we for whatever reason fail to create
> an item manually, it never gets bot created.
>
> Cheers
> Yaroslav
>
> On Fri, Sep 1, 2017 at 4:59 PM, Ed Summers <e...@pobox.com> wrote:
>
>> So each language wikipedia does this on an ad-hoc basis?
>>
>> > On Sep 1, 2017, at 9:36 AM, Jane Darnell <jane...@gmail.com> wrote:
>> >
>> > Checking the history of that page shows it was recently created. Not
>> sure how the Finns do this but like the Dutch they probably have a bot that
>> creates Wikidata items after a month or so has passed (this avoids creating
>> items for things that get deleted through the "speedy delete" process). You
>> can create the item yourself, or wait another month I guess.
>> > https://fi.wikipedia.org/w/index.php?title=Teuro=history
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata