Re: [Wikidata] People who died in 2015 who were Dutch

2016-08-31 Thread Gerard Meijssen
Hoi,
Did they have a date of death in Wikidata as well ?
Thanks,
 GerardM

On 31 August 2016 at 11:53, Dimitris Kontokostas  wrote:

> Based on the other open related thread [1] there are references for the
> deathDate of 1950 people [2]
> I manually checked a random 5 pages and all had a reference "imported from
> Wikipedia" so maybe this is a good start
>
> (cc'ing wiki-cite after Dario's suggestion on the other thread)
>
> Best,
> Dimitris
>
> [1] https://lists.wikimedia.org/pipermail/wikidata/2016-August/009447.html
> [2] curl http://downloads.dbpedia.org/temporary/citations/enwiki-
> 20160305-citedFacts.tql.bz2 | bzcat | grep "deathDate"
>
>
> On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> On 04.06.2015 12:17, Dimitris Kontokostas wrote:
>> ...
>>
>>>
>>> Another question: can DBpedia extract references from Wikipedia
>>> articles too? If this would be possible, it might be feasible to
>>> guess and suggest a reference (or a list of references). Especially
>>> with things like date of death, one would expect that references
>>> have a publication date very close to (but strictly after) the
>>> event, which could narrow down the choices very much.
>>>
>>>
>>> We don't extract them for now, although I think we could relatively
>>> easily. The problem in this case would be that we cannot associate
>>> references with facts. The DBpedia Information Extraction Framework is
>>> quite module and can be easily extended with new extractors but it is
>>> hard to make these extractors "talk to each other".
>>> So we could easily get something like the following
>>> dbp:A dbo:birthDate "..."
>>> dbp:A dbo:deahthDate "..."
>>> dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 something else"
>>> depending on the modeling
>>> dbp:A dbo:reference dbp:r2
>>>
>>> but not sure if this solves your problem
>>>
>>
>> Yes, I understand that you can hardly get the association between
>> extracted facts and references. My suggestion was to extract both
>> independently and then to query for references that have a publication date
>> close to a person's death so as to suggest them to users as a possible
>> reference for the death-date fact. This would still require a manual check,
>> since we cannot know if the guessed reference belongs to the date of death,
>> but if it has a high precision it would be a worthwhile way of spending
>> volunteer time to obtain confirmed references.
>>
>> At the same time, it might be one of the fastest ways to get sourced date
>> of death into Wikidata, since news articles will usually appear before the
>> major authority files are updated (so even if we get donations from them,
>> some lag would remain). With such an extraction framework, one could
>> establish a pipeline from Wikipedia to Wikidata.
>>
>> In the long run, references from authority files will become more
>> valuable than news articles, because they are more long-lived.
>>
>> Best wishes,
>>
>> Markus
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
>
> --
> Kontokostas Dimitris
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] People who died in 2015 who were Dutch

2016-08-31 Thread Dimitris Kontokostas
Based on the other open related thread [1] there are references for the
deathDate of 1950 people [2]
I manually checked a random 5 pages and all had a reference "imported from
Wikipedia" so maybe this is a good start

(cc'ing wiki-cite after Dario's suggestion on the other thread)

Best,
Dimitris

[1] https://lists.wikimedia.org/pipermail/wikidata/2016-August/009447.html
[2] curl
http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2
| bzcat | grep "deathDate"


On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 04.06.2015 12:17, Dimitris Kontokostas wrote:
> ...
>
>>
>> Another question: can DBpedia extract references from Wikipedia
>> articles too? If this would be possible, it might be feasible to
>> guess and suggest a reference (or a list of references). Especially
>> with things like date of death, one would expect that references
>> have a publication date very close to (but strictly after) the
>> event, which could narrow down the choices very much.
>>
>>
>> We don't extract them for now, although I think we could relatively
>> easily. The problem in this case would be that we cannot associate
>> references with facts. The DBpedia Information Extraction Framework is
>> quite module and can be easily extended with new extractors but it is
>> hard to make these extractors "talk to each other".
>> So we could easily get something like the following
>> dbp:A dbo:birthDate "..."
>> dbp:A dbo:deahthDate "..."
>> dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 something else"
>> depending on the modeling
>> dbp:A dbo:reference dbp:r2
>>
>> but not sure if this solves your problem
>>
>
> Yes, I understand that you can hardly get the association between
> extracted facts and references. My suggestion was to extract both
> independently and then to query for references that have a publication date
> close to a person's death so as to suggest them to users as a possible
> reference for the death-date fact. This would still require a manual check,
> since we cannot know if the guessed reference belongs to the date of death,
> but if it has a high precision it would be a worthwhile way of spending
> volunteer time to obtain confirmed references.
>
> At the same time, it might be one of the fastest ways to get sourced date
> of death into Wikidata, since news articles will usually appear before the
> major authority files are updated (so even if we get donations from them,
> some lag would remain). With such an extraction framework, one could
> establish a pipeline from Wikipedia to Wikidata.
>
> In the long run, references from authority files will become more valuable
> than news articles, because they are more long-lived.
>
> Best wishes,
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] People who died in 2015 who were Dutch

2016-01-27 Thread Dimitris Kontokostas
Coming back to an old thread. We now extract references from Wikipedia and
are available in the 2015-10 beta release

citation_data_en.ttl.bz2
citation_links_en.ttl.bz2


any feedback is more than welcome


Best,

Dimitris


On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 04.06.2015 12:17, Dimitris Kontokostas wrote:
> ...
>
>>
>> Another question: can DBpedia extract references from Wikipedia
>> articles too? If this would be possible, it might be feasible to
>> guess and suggest a reference (or a list of references). Especially
>> with things like date of death, one would expect that references
>> have a publication date very close to (but strictly after) the
>> event, which could narrow down the choices very much.
>>
>>
>> We don't extract them for now, although I think we could relatively
>> easily. The problem in this case would be that we cannot associate
>> references with facts. The DBpedia Information Extraction Framework is
>> quite module and can be easily extended with new extractors but it is
>> hard to make these extractors "talk to each other".
>> So we could easily get something like the following
>> dbp:A dbo:birthDate "..."
>> dbp:A dbo:deahthDate "..."
>> dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 something else"
>> depending on the modeling
>> dbp:A dbo:reference dbp:r2
>>
>> but not sure if this solves your problem
>>
>
> Yes, I understand that you can hardly get the association between
> extracted facts and references. My suggestion was to extract both
> independently and then to query for references that have a publication date
> close to a person's death so as to suggest them to users as a possible
> reference for the death-date fact. This would still require a manual check,
> since we cannot know if the guessed reference belongs to the date of death,
> but if it has a high precision it would be a worthwhile way of spending
> volunteer time to obtain confirmed references.
>
> At the same time, it might be one of the fastest ways to get sourced date
> of death into Wikidata, since news articles will usually appear before the
> major authority files are updated (so even if we get donations from them,
> some lag would remain). With such an extraction framework, one could
> establish a pipeline from Wikipedia to Wikidata.
>
> In the long run, references from authority files will become more valuable
> than news articles, because they are more long-lived.
>
> Best wishes,
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] People who died in 2015 who were Dutch

2015-06-04 Thread Markus Krötzsch

On 04.06.2015 10:49, Gerard Meijssen wrote:

Hoi,
Markus with all due respect, we have a LOT of data in Wikidata that is
plain wrong. When we add the missing data from DBpedia it is of a higher
quality than what we have. Insisting that it first needs to be validated
is foolish. It is not done for any of the work we do. All our bots make
use of Wikipedia and in this DBpedia is no different.

I do agree that it makes sense to verify the data that is different. But
even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our
practise has been to remove the 1929 for the more precise data.

Let us be pragmatic and improve our data and start with what is missing.


That's exactly what I am saying. I think your misconception is that what 
you suggest does not happen because of some opposition from the Wikidata 
community. In reality, it simply does not happen because nobody did it 
yet, neither from the DBpedia nor from the Wikidata community. It does 
not help very much to post arguments of how useful this would be. At 
least you don't need to convince me. What is needed is deed, not talk.


The folks working on the primary sources tool are trying to provide a 
standard process for almost arbitrary data imports. It was just my first 
thought for turning your complaint into something that could work as a 
solution -- if you have a better idea which tool to use, feel free to 
post it.


Regards,

Markus



Thanks,
 GerardM

On 4 June 2015 at 10:31, Markus Krötzsch mar...@semantic-mediawiki.org
mailto:mar...@semantic-mediawiki.org wrote:

Hi Dmitris,

Interesting situation. If you have contradictory data from several
templates, then the challenge will be to find out which information
is correct for importing it to Wikidata. Could your dataset maybe
become an input to the primary sources tool [1]? Then Wikidata users
could help to clean the dataset and try to find references (as you
know, references are quite important for Wikidata, but it would
really be asking too much of DBpedia to provide these).

This could be a viable strategy to merge DBpedia data into Wikidata.
This email was only about person-related data, but one could do this
for any kind of dataset where the information in DBpedia is of
relatively high quality. I don't know exactly what the primary
sources tool needs as input (it is still beta), but I think it
mainly requires that a decent quality set of candidate statements is
extracted and provided in some suitable format.

As a first step, it might make sense to do a scan to see how many
date-of-death (or whatever) statements in DBpedia are not yet found
in Wikidata. If it is a small dataset (e.g., only a subset of the
people who have died in the last year), then maybe one could also
add and verify it in another way, not going through primary sources.
But especially for recent deaths, there might be a great variety of
sources (esp. newspaper articles) that are not easy to find without
user support.

Regards,

Markus

[1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool



On 04.06.2015 09 tel:04.06.2015%2009:56, Dimitris Kontokostas wrote:



On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch
mar...@semantic-mediawiki.org
mailto:mar...@semantic-mediawiki.org
mailto:mar...@semantic-mediawiki.org
mailto:mar...@semantic-mediawiki.org

wrote:

 On 03.06.2015 22 tel:03.06.2015%2022:44, Gerard Meijssen
wrote:

 Hoi,
 The Dutch indicated their willingness to add the dead to
 Wikidata ... I
 add quite a few dead from other countries and because
of Jura1
 Brazilians who died in 2015 have an added significance.

 Given that we CAN produce lists like this, it makes
sense to
 reconsider
 the offer by the fine people from DBpedia and have the
 information they
 harvest from Wikipedia added automatically to
Wikidata.. One
 reason I
 pointed out on my recent blogpost..


 DBpedia is getting this information from the contents of the
 template Persondata as used on Wikipedia [1]. The enwiki
community
 just recently decided to maintain this data on Wikidata
instead. I
 guess this means that (English) DBpedia will not contain
this data
 in the future, unless they import it from Wikidata (they are
 tracking the issue at [2]).


Note that DBpedia gets person data information both from the
persondata
template and from the infobox templates using the mappings wiki.
We also noted that the data between the two is many times out of
sync
(and usually the person data is stalled/wrong 

Re: [Wikidata] People who died in 2015 who were Dutch

2015-06-04 Thread Gerard Meijssen
Hoi,
Markus with all due respect, we have a LOT of data in Wikidata that is
plain wrong. When we add the missing data from DBpedia it is of a higher
quality than what we have. Insisting that it first needs to be validated is
foolish. It is not done for any of the work we do. All our bots make use of
Wikipedia and in this DBpedia is no different.

I do agree that it makes sense to verify the data that is different. But
even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise
has been to remove the 1929 for the more precise data.

Let us be pragmatic and improve our data and start with what is missing.
Thanks,
GerardM

On 4 June 2015 at 10:31, Markus Krötzsch mar...@semantic-mediawiki.org
wrote:

 Hi Dmitris,

 Interesting situation. If you have contradictory data from several
 templates, then the challenge will be to find out which information is
 correct for importing it to Wikidata. Could your dataset maybe become an
 input to the primary sources tool [1]? Then Wikidata users could help to
 clean the dataset and try to find references (as you know, references are
 quite important for Wikidata, but it would really be asking too much of
 DBpedia to provide these).

 This could be a viable strategy to merge DBpedia data into Wikidata. This
 email was only about person-related data, but one could do this for any
 kind of dataset where the information in DBpedia is of relatively high
 quality. I don't know exactly what the primary sources tool needs as input
 (it is still beta), but I think it mainly requires that a decent quality
 set of candidate statements is extracted and provided in some suitable
 format.

 As a first step, it might make sense to do a scan to see how many
 date-of-death (or whatever) statements in DBpedia are not yet found in
 Wikidata. If it is a small dataset (e.g., only a subset of the people who
 have died in the last year), then maybe one could also add and verify it in
 another way, not going through primary sources. But especially for recent
 deaths, there might be a great variety of sources (esp. newspaper articles)
 that are not easy to find without user support.

 Regards,

 Markus

 [1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool



 On 04.06.2015 09:56, Dimitris Kontokostas wrote:



 On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch
 mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org

 wrote:

 On 03.06.2015 22:44, Gerard Meijssen wrote:

 Hoi,
 The Dutch indicated their willingness to add the dead to
 Wikidata ... I
 add quite a few dead from other countries and because of Jura1
 Brazilians who died in 2015 have an added significance.

 Given that we CAN produce lists like this, it makes sense to
 reconsider
 the offer by the fine people from DBpedia and have the
 information they
 harvest from Wikipedia added automatically to Wikidata.. One
 reason I
 pointed out on my recent blogpost..


 DBpedia is getting this information from the contents of the
 template Persondata as used on Wikipedia [1]. The enwiki community
 just recently decided to maintain this data on Wikidata instead. I
 guess this means that (English) DBpedia will not contain this data
 in the future, unless they import it from Wikidata (they are
 tracking the issue at [2]).


 Note that DBpedia gets person data information both from the persondata
 template and from the infobox templates using the mappings wiki.
 We also noted that the data between the two is many times out of sync
 (and usually the person data is stalled/wrong because people don't know
 it's existence).

 e.g. we have 28K items with double birth dates one from the infobox and
 another from persondata.

 select count(*) where {?s dbpedia-owl:birthDate ?b1 ;
 dbpedia-owl:birthDate ?b2 .
 filter (?b1 != ?b2  ?b1  ?b2)}

 http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgquery=select+count%28*%29+where+%7B%3Fs+dbpedia-owl%3AbirthDate+%3Fb1+%3B+dbpedia-owl%3AbirthDate+%3Fb2+.%0D%0Afilter+%28%3Fb1+%21%3D+%3Fb2+%26%26+%3Fb1+%3C+%3Fb2%29%7Dformat=text%2Fhtmltimeout=3debug=on

 The persondata template is used in German Wikipedia as well. The
 following release has ~ 2.2M triples coming from the german persondata
 template (which iirc has the same problems as the english)

 Best,
 Dimitris


 So you see, times are changing quickly ... but overall I hope that
 this is still solving the problem you identified, in fact in a much
 more direct way than one might have hoped for :-).

 DBpedia may still play a role. I don't know how exactly the enwiki
 community is planning to implement the move from Persondata to
 Wikidata. It could be that DBpedia is the only project extracting
 this data. So in a way, your suggestion might be a great idea,
 though not as a long-term data maintenance plan but as a one-time
 help for