Re: [Wikidata] WCQS Beta Downtime beginning Feb 4 18:30 UTC

2021-02-04 Thread Maarten Dammers

Hi Ryan and Guillaume,

Last time I checked WCQS was short for "Wikimedia Commons Query Service" 
( https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service ) so 
I'm a bit puzzled why you posted this on the Wikidata mailing list 
instead of the Wikimedia Commons list? I hope it will be back soon.


Maarten

On 03-02-2021 22:39, Guillaume Lederrey wrote:
We ran some numbers and it looks that the data reload is going to take 
around 2.5 days, during which WCQS will be unavailable. Sorry for this 
interruption of service.


On Wed, 3 Feb 2021, 21:16 Guillaume Lederrey, > wrote:


On Wed, Feb 3, 2021 at 8:53 PM Ryan Kemper mailto:rkem...@wikimedia.org>> wrote:

Hi all,

Our host *wcqs-beta-01.eqiad.wmflabs* is running low on disk
space due to its blazegraph journal dataset size. In order to
free up space we will need to take the service down, delete
the journal and re-import from the latest dump. Service
interruption will begin at *Feb 4 18:30 UTC* and continue
until the data reload is complete.


Just to be clear, this is the host behind
https://wcqs-beta.wmflabs.org/.

We'll send out a notification when the downtime begins and
when it ends as well.

*Note*: This doesn't affect WDQS, only the WCQS beta.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
	*Guillaume Lederrey* (he/him)

Engineering Manager
Wikimedia Foundation 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] ACM UMAP 2021: Third Call-for-Papers - Updated submission information

2021-01-12 Thread Maarten Dammers

Hi Violeta,

On 12-01-2021 00:01, Violeta Ilik wrote:
I want to say I am surprised but no, I am not. This list is full of 
unwelcoming people who somehow continue to thrive here.


This list is very welcoming to people who want discuss anything related 
to Wikidata, off-topic spammy conference call-for-papers on the other 
hand, are not welcome. Please don't attack other members of this list. 
Play the ball, not the player.


Maarten


Unbelievable.

-vi


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] ACM UMAP 2021: Third Call-for-Papers - Updated submission information

2021-01-11 Thread Maarten Dammers

Hi Oana,

On 11-01-2021 08:58, Oana Inel wrote:

--- Apologies for cross-posting ---


Apologies not accepted. This doesn't seem to be on topic for this list. 
Have a look at 
https://ruben.verborgh.org/blog/2014/01/31/apologies-for-cross-posting/


Maarten



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikidata-tech] Blank node deprecation in WDQS & Wikibase RDF model

2020-04-17 Thread Maarten Dammers

Hi David,

Peter brings up some very valid points and I agree with him. I don't 
really like how you present this as a done deal to the community. Now it 
looks like you have some software performance problem, you think you 
found some solution and without any community consultation, you're 
pushing this through.


Maarten

On 17-04-20 16:11, David Causse wrote:

Thanks for the feedback,
just a note to say that I responded via 
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search


David Causse

On Thu, Apr 16, 2020 at 8:16 PM Peter F. Patel-Schneider 
mailto:pfpschnei...@gmail.com>> wrote:


I am taking the liberty of replying to the list because of the
problems with
supplied justification for this change that are part of the
original message.

I believe that https://phabricator.wikimedia.org/T244341#5889997
is inadequate
for determining that blank nodes are problematic.  First, the fact
that
determining isomorphism in RDF graphs with blank nodes is
non-polynomial is a
red herring.  If the blank nodes participate in only one triple then
isomorphism remains easy.  Second, the query given to remove a
some-value SNAK
is incorrect in general - it will remove all triples with the
blank node as
object.  (Yes, if the blank nodes found are leaves then no extra
triples are
removed.)  A simpler DELETE WHERE will have the seemingly-desired
result.

This is not to say that blank nodes do not cause problems.
According to the
semanticss of both RDF and SPARQL blank nodes are anonymous so to
repeatedly
access the same blank node in a graph one has to access the stored
graph using
an interface that exposes the retained identity of blank nodes. 
It looks as
if the WDSQ is built on a system that has such an interface. As
the WDQS
already uses user-visible features that are not part of SPARQL,
adding (or
maybe even only utilizing) a non-standard interface that is only used
internally would not be a problem.

One problem when using generated URLs to replace blank nodes is
that these
generated URLs have to be guaranteed stable and unique (not just
stable) for
the lifetime of the query service.  Another problem is that yet
another
non-standard function is being introduced, pulling the RDF dump of
Wikidata
yet further from RDF.

So this is a significant change as far as users are concerned that
also has
potential implementation issues.   Why not just use an internal
interface that
exposes a retained identity for blank nodes?

Peter F. Patel-Schneider



On 4/16/20 8:34 AM, David Causse wrote:
> Hi,
>
> This message is relevant for people writing SPARQL queries and
using the
> Wikidata Query Service:
>
> As part of the work of redesigning the WDQS updater[0] we
identified that
> blank nodes[1] are problematic[2] and we plan to deprecate their
usage in
> the wikibase RDF model[3]. To ease the deprecation process we are
> introducing the new function wikibase:isSomeValue() that can be
used in
> place of isBlank() when it was used to filter SomeValue[4].
>
> What does this mean for you: nothing will change for now, we are
only
> interested to know if you encounter any issues with the
> wikibase:isSomeValue() function when used as a replacement of
the isBlank()
> function. More importantly, if you used the isBlank() function
for other
> purposes than identifying SomeValue (unknown values in the UI),
please let
> us know as soon as possible.
>
> The current plan is as follow:
>
> 1. Introduce a new wikibase:isSomeValue() function
> We are at this step. You can already use wikibase:isSomeValue()
in the Query
> Service. Here’s an example query (Humans whose gender we know we
don't know):
> SELECT ?human WHERE {
> ?human wdt:P21 ?gender
> FILTER wikibase:isSomeValue(?gender) .
> }
> You can also search the wikis[8] to find all the pages where the
function
> isBlank is referenced in a SPARQL query.
>
> 2. Generate stable labels for blank nodes in the wikibase RDF output
> Instead of "autogenerated" blank node labels wikidata will now
provide a
> stable label for blank nodes. In other words the wikibase
triples using
> blank nodes such as:
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576 _:genid2 ;
> will become
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576
> _:1668ace9a6860f7b32569c45fe5a5c0d ;
> This is not a breaking change.
>
> 3. [BREAKING CHANGE] Convert blank nodes to IRIs in the WDQS updater
> At this point some WDQS servers will start returning IRIs such
> as
http://www.wikidata.org/somevalue/1668ace9a6860f7b32569c45fe5a5c0d (the
> exact form of the IRI is stil

Re: [Wikidata] WDQS and SPARQL Endpoint Compatibility

2020-03-31 Thread Maarten Dammers

Hi Egon,

On 31-03-20 09:02, Egon Willighagen wrote:


My bot produces a weekly federation report at

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federation_report



The WikiPathways SPARQL endpoint URL has changed, and I have requested 
an update (Jan 2020, [0]), but no update or reply yet.


Maarten, that is causing the simple query in this report to fail.


I created https://phabricator.wikimedia.org/T249041 for this. The 
Wikimedia site requests is one of the more active boards in my 
experience. Let's see how it goes. If it goes well, we can just start 
moderating 
https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input and 
create a site request for everything that gets approved.


Maarten



Egon

0.https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_federation_input#Updated_URL_for_the_WikiPathways_SPARQL_endpoint

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS and SPARQL Endpoint Compatibility

2020-03-30 Thread Maarten Dammers
Since Stas left last year, unfortunately nobody from the WMF has done 
anything with 
https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input . I don't 
know if the new SPARQL people are even aware of this page.


My bot produces a weekly federation report at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federation_report 



Maarten

On 30-03-20 22:41, Lucas Werkmeister wrote:

The current whitelist is documented at
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/SPARQL_Federation_endpoints
and new additions can be proposed at
https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input.

Cheers,
Lucas

On 30.03.20 20:31, Kingsley Idehen wrote:

All,

I am opening up this thread to discuss the generic support of SPARQL
endpoints by WDQS. Correct me if I am wrong, but right now it can use
SPARQL-FED against a select number of registered endpoints?

As you all know, the LOD Cloud Knowledge Graph is a powerful repository
of loosely-coupled, data, information, and knowledge. One that could
really help humans and software agents in the collective quest to defeat
the COVID19 disease.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Notability and classing "notable" properties

2019-11-19 Thread Maarten Dammers
Are you looking for 
https://www.wikidata.org/wiki/MediaWiki:Wikibase-SortedProperties ?


On 19-11-19 22:15, Thad Guidry wrote:
When viewing Items on Wikidata that I am researching or quickly having 
to disambiguate, I often end up scrolling down endlessly to see 
important "notable" properties for People.
Some of them we are familiar with such as "award received" 
 or "notable work" 
.


For example, Frank Lloyd Wright 

So my 3 Questions ::

1. I'm curious if there is already a preference or tool that would 
allow those "popular" or "notable" kinds of properties to be shown 
further up on the Item pages when looking at People and deriving 
Notability for them?


2. More generally, What or Who controls the listview-item of divs 
inside div.wikibase-statementgrouplistview ?
Perhaps, one way to look at this is that of "notability" 
, and where we can 
definitely see that some properties lend themselves to that concept of 
"notability"  like 
"award received"  and 
others not such much like "sex or gender" 
.  For instance, 
properties that are an instance of

"Wikidata property related to awards, prizes and honours"


3. How do others collect the "notable" properties floating around 
Wikidata?


Thad
https://www.linkedin.com/in/thadguidry/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] searching for Wikidata items

2019-06-05 Thread Maarten Dammers

Hi Tim,

Pywikibot has generators around the API. For example for search you have 
https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html#pywikibot.pagegenerators.SearchPageGenerator 
. So basically anything you can search for as a user can also be used as 
a generator in Pywikibot.


Say for example all bands that have "Bush" in their name. We have the 
band Bush at https://www.wikidata.org/wiki/Q247949 . With a bit of a 
trick you can see what the search engine knows about a page: 
https://www.wikidata.org/w/index.php?title=Q247949&action=cirrusdump . 
We can use this to limit the search engine to limit the results to only 
instance of (P31) band (Q215380), see 
https://www.wikidata.org/w/index.php?search=bush+-wbhasstatement%3A%22P31%3DQ215380%22&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns120=1 
or as API output at 
https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=bush%20-wbhasstatement:%22P31=Q215380%22&format=json


Pywikibot accepts the same search string:
>>> import pywikibot
>>> from pywikibot import pagegenerators
>>> query = 'bush -wbhasstatement:"P31=Q215380"'
>>> repo = pywikibot.Site().data_repository()
>>> searchgen = pagegenerators.SearchPageGenerator(query,site=repo)
>>> for item in searchgen:
... print (item.title())
...
Q1156378
Q16945866
Q16953971
Q247949
Q2928714
Q5001360
Q5001432
Q7720714
Q7757229
>>>

Maarten

On 04-06-19 15:44, Marielle Volz wrote:
Yes, the api is at 
https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=Bush


There's a sandbox where you can play with the various options:
https://www.wikidata.org/wiki/Special:ApiSandbox#action=query&format=json&list=search&srsearch=Bush


On Tue, Jun 4, 2019 at 2:22 PM Tim Finin > wrote:


What's the best way to search Wikidata for items whose name or
alias matches a string?  The search available via pywikibot seems
to only find a match if the search string is a prefix of an item's
name or alias, so searching for "Bush" does not return any of the
the George Bush items. I don't want to use a SPARQL query with a
regex, since I expect that to be slow.

The search box on the Wikidata pages is closer to what I want.  Is
there a good way to call this via an API?

Ideally, I'd like to be able to specify a language and also a set
of types, but I can do that once I've identified candidates based
on a simple match with a query string.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikidata-tech] wb_terms redesign

2019-05-04 Thread Maarten Dammers

Hi Alaa,

On 25-04-19 16:38, Alaa Sarhan wrote:
> This is really a defective redesign. It reintroduced numeric IDs to 
be removed by T114902. See also T179928. We should reconsider 
reintroduce a new table to link unperfixed and perfixed entity ID.


The new schema has been optimized as much as possible to allow maximum 
scalability as it will contain a massive amount of data that we hope 
it doubles or even triple in size as soon as we can.


The new schema has been optimized for your use cases and complete breaks 
any tools combining page table data with wikibase data. If you really 
would care about tool developers, you wouldn't trash the unprefixed ID.


Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Semantic annotation of red links on Wikipedia

2018-09-29 Thread Maarten Dammers
New property proposed at 
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wikipedia_suggested_article_name



On 28-09-18 11:27, Lucie-Aimée Kaffee wrote:
The idea of linking red links is very interesting, I believe, 
especially as we have Wikidata items to many of the missing articles.
We discussed the concept of "smart red links" (linking to the 
ArticlePlaceholder pages, as someone pointed out before) a while ago, 
documented at 
https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder/Smart_red_links


I believe it's a very interesting direction to explore, especially for 
Wikipedias with a smaller amount of articles and therefore naturally a 
higher amount of red links.


On Thu, 27 Sep 2018 at 21:06, Maarten Dammers <mailto:maar...@mdammers.nl>> wrote:


Hello,

On 27-09-18 01:16, Andy Mabbett wrote:
> On 24 September 2018 at 18:48, Maarten Dammers
mailto:maar...@mdammers.nl>> wrote:
>
>> Wouldn't it be nice to be able to make a connection between the
red link on
>> Wikipedia and the Wikidata item?
> This facility already exists:
>
>

https://en.wikipedia.org/wiki/Template:Interlanguage_link#Link_to_Reasonator_and_Wikidata
You seem to have done some selective quoting and selective reading. I
    addressed this in my original email:

On 24-09-18 19:48, Maarten Dammers wrote:
> Where to store this link? I'm not sure about that. On some
Wikipedia's
> people have tested with local templates around the red links.
That's
> not structured data, clutters up the Wikitext, it doesn't scale and
> the local communities generally don't seem to like the approach.
> That's not the way to go.
James also shared some links related to this.

Maarten




___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Looking for "data quality check" bots

2018-09-29 Thread Maarten Dammers

Hi Ettore,


On 26-09-18 14:31, Ettore RIZZA wrote:

Dear all,

I'm looking for Wikidata bots that perform accuracy audits. For 
example, comparing the birth dates of persons with the same date 
indicated in databases linked to the item by an external-id.
Let's have a look at the evolution of automated editing. The first step 
is to add missing data from anywhere. Bots importing date of birth are 
an example of this. The next step is to add data from somewhere with a 
source or add sources to existing unsourced or badly sourced statements. 
As far as I can see that's where we are right now, see for example edits 
like 
https://www.wikidata.org/w/index.php?title=Q41264&type=revision&diff=619653838&oldid=616277912 
is . Of course the next step would be to be able to compare existing 
sourced statements with external data to find differences. But how would 
the work flow be? Take for example Johannes Vermeer ( 
https://www.wikidata.org/wiki/Q41264 ). Extremely well documented and 
researched, but 
http://www.getty.edu/vow/ULANFullDisplay?find=&role=&nation=&subjectid=500032927 
and https://rkd.nl/nl/explore/artists/80476 combined provide 3 different 
dates of birth and 3 different dates of death. When it comes to these 
kind of date mismatches, it's generally first come, first served (first 
date added doesn't get replaced). This mismatch could show up in some 
report. I can check it as a human and maybe do some adjustments, but how 
would I sign it of to prevent other people from doing the same thing 
over and over again?


With federated SPARQL queries it becomes much easier to generate reports 
of mismatches. See for example 
https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches .


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Semantic annotation of red links on Wikipedia

2018-09-27 Thread Maarten Dammers

Hello,

On 27-09-18 01:16, Andy Mabbett wrote:

On 24 September 2018 at 18:48, Maarten Dammers  wrote:


Wouldn't it be nice to be able to make a connection between the red link on
Wikipedia and the Wikidata item?

This facility already exists:


https://en.wikipedia.org/wiki/Template:Interlanguage_link#Link_to_Reasonator_and_Wikidata
You seem to have done some selective quoting and selective reading. I 
addressed this in my original email:


On 24-09-18 19:48, Maarten Dammers wrote:
Where to store this link? I'm not sure about that. On some Wikipedia's 
people have tested with local templates around the red links. That's 
not structured data, clutters up the Wikitext, it doesn't scale and 
the local communities generally don't seem to like the approach. 
That's not the way to go. 

James also shared some links related to this.

Maarten




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Semantic annotation of red links on Wikipedia

2018-09-24 Thread Maarten Dammers

Hi James,


On 24-09-18 20:08, James Heald wrote:
The problem, if you don't put something on the wikipage itself, is how 
then do you determine which [[John A. Smith]] a redlink was intended 
to refer to, if there is more than one possibility.
That's a classic disambiguation problem. Most Wikipedia's seem to be 
pretty good at dealing with these. At least for the Dutch Wikipedia I 
know people working on disambiguation are quite active and I encounter 
quite a few disambiguated red links. If this would really become an 
issue, a qualifier could be used to track based on what article (it's 
linked item) the link was made. So in the case of Friedrich Ris, that 
would be https://www.wikidata.org/wiki/Q1624113 
(https://nl.wikipedia.org/wiki/Aethriamanta_aethra).


Maarten



But Maarten is right, that at least on en-wiki, the suggestion of 
adding templates to link to Wikidata content has met with considerable 
hostility, expressed in two recent RfCs:
https://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Archive_202#RfC:_Linking_to_wikidata 



https://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Archive_204#New_RFC_on_linking_to_Wikidata 



  -- James.



On 24/09/2018 18:48, Maarten Dammers wrote:

Hi everyone,

According to https://www.youtube.com/watch?v=TLuM4E6IE5U : "Semantic 
annotation is the process of attaching additional information to 
various concepts (e.g. people, things, places, organizations etc) in 
a given text or any other content. Unlike classic text annotations 
for reader's reference, semantic annotations are used by machines to 
refer to."
(more at 
https://ontotext.com/knowledgehub/fundamentals/semantic-annotation/ )


On Wikipedia a red link is a link to an article that hasn't been 
created (yet) in that language. Often another language does have an 
article about the subject or at least we have a Wikidata item about 
the subject. Take for example 
https://nl.wikipedia.org/w/index.php?title=Friedrich_Ris . It has 
over 250 incoming links, but the person doesn't have an article in 
Dutch. We have a Wikidata item with links to 7 Wikipedia's at 
https://www.wikidata.org/wiki/Q116510 , but no way to relate 
https://nl.wikipedia.org/w/index.php?title=Friedrich_Ris with 
https://www.wikidata.org/wiki/Q116510 .


Wouldn't it be nice to be able to make a connection between the red 
link on Wikipedia and the Wikidata item?


Let's assume we have this list somewhere. We would be able to offer 
all sorts of nice features to our users like:

* Hover of the link to get a hovercard in your favorite backup language
* Generate an article placeholder for the user with basic information 
in the local language
* Pre-populate the translate extension so you can translate the 
article from another language

(probably plenty of other good uses)

Where to store this link? I'm not sure about that. On some 
Wikipedia's people have tested with local templates around the red 
links. That's not structured data, clutters up the Wikitext, it 
doesn't scale and the local communities generally don't seem to like 
the approach. That's not the way to go. Maybe a better option would 
be to create a new property on Wikidata to store the name of the 
future article. Something like Q116510: Pxxx -> (nl)"Friedrich Ris". 
Would be easiest because the infrastructure is there and you can just 
build tools on top of it, but I'm afraid this will cause a lot of 
noise on items. A couple of suggestions wouldn't be a problem, but 
what is keeping people from adding the suggestion in 100 languages? 
Or maybe restrict the usage that a Wikipedia must have at least 1 (or 
n) incoming links before people are allowed to add it?
We could create a new projects on the Wikimedia Cloud to store the 
links, but that would be quite the extra time investment setting up 
everything.


What do you think?

Maarten




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



---
This email has been checked for viruses by AVG.
https://www.avg.com


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Semantic annotation of red links on Wikipedia

2018-09-24 Thread Maarten Dammers

Hi everyone,

According to https://www.youtube.com/watch?v=TLuM4E6IE5U : "Semantic 
annotation is the process of attaching additional information to various 
concepts (e.g. people, things, places, organizations etc) in a given 
text or any other content. Unlike classic text annotations for reader's 
reference, semantic annotations are used by machines to refer to."
(more at 
https://ontotext.com/knowledgehub/fundamentals/semantic-annotation/ )


On Wikipedia a red link is a link to an article that hasn't been created 
(yet) in that language. Often another language does have an article 
about the subject or at least we have a Wikidata item about the subject. 
Take for example 
https://nl.wikipedia.org/w/index.php?title=Friedrich_Ris . It has over 
250 incoming links, but the person doesn't have an article in Dutch. We 
have a Wikidata item with links to 7 Wikipedia's at 
https://www.wikidata.org/wiki/Q116510 , but no way to relate 
https://nl.wikipedia.org/w/index.php?title=Friedrich_Ris with 
https://www.wikidata.org/wiki/Q116510 .


Wouldn't it be nice to be able to make a connection between the red link 
on Wikipedia and the Wikidata item?


Let's assume we have this list somewhere. We would be able to offer all 
sorts of nice features to our users like:

* Hover of the link to get a hovercard in your favorite backup language
* Generate an article placeholder for the user with basic information in 
the local language
* Pre-populate the translate extension so you can translate the article 
from another language

(probably plenty of other good uses)

Where to store this link? I'm not sure about that. On some Wikipedia's 
people have tested with local templates around the red links. That's not 
structured data, clutters up the Wikitext, it doesn't scale and the 
local communities generally don't seem to like the approach. That's not 
the way to go. Maybe a better option would be to create a new property 
on Wikidata to store the name of the future article. Something like 
Q116510: Pxxx -> (nl)"Friedrich Ris". Would be easiest because the 
infrastructure is there and you can just build tools on top of it, but 
I'm afraid this will cause a lot of noise on items. A couple of 
suggestions wouldn't be a problem, but what is keeping people from 
adding the suggestion in 100 languages? Or maybe restrict the usage that 
a Wikipedia must have at least 1 (or n) incoming links before people are 
allowed to add it?
We could create a new projects on the Wikimedia Cloud to store the 
links, but that would be quite the extra time investment setting up 
everything.


What do you think?

Maarten




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Mapping Wikidata to other ontologies

2018-09-22 Thread Maarten Dammers

Hi everyone,

Last week I presented Wikidata at the Semantics conference in Vienna ( 
https://2018.semantics.cc/ ). One question I asked people was: What is 
keeping you from using Wikidata? One of the common responses is that 
it's quite hard to combine Wikidata with the rest of the semantic web. 
We have our own private ontology that's a bit on an island. Most of our 
triples are in our own private format and not available in a more 
generic, more widely use ontology.


Let's pick an example: Claude Lussan. No clue who he is, but my bot 
seems to have added some links and the item isn't too big. Our URI is 
http://www.wikidata.org/entity/Q2977729 and this is equivalent of 
http://viaf.org/viaf/29578396 and 
http://data.bibliotheken.nl/id/thes/p173983111 . If you look at 
http://www.wikidata.org/entity/Q2977729.rdf this equivalence is 
represented as:

http://viaf.org/viaf/29578396"/>
http://data.bibliotheken.nl/id/thes/p173983111"/>

Also outputting it in a more generic way would probably make using it 
easier than it is right now. Last discussion about this was at 
https://www.wikidata.org/wiki/Property_talk:P1921 , but no response 
since June.


That's one way of linking up, but another way is using equivalent 
property ( https://www.wikidata.org/wiki/Property:P1628 ) and equivalent 
class ( https://www.wikidata.org/wiki/Property:P1709 ). See for example 
sex or gender ( https://www.wikidata.org/wiki/Property:P21) how it's 
mapped to other ontologies. This won't produce easier RDF, but some 
smart downstream users have figured out some SPARQL queries. So linking 
up our properties and classes to other ontologies will make using our 
data easier. This is a first step. Maybe it will be used in the future 
to generate more RDF, maybe not and we'll just document the SPARQL 
approach properly.


The equivalent property and equivalent class are used, but not that 
much. Did anyone already try a structured approach with reporting? I'm 
considering parsing popular ontology descriptions and producing reports 
of what is linked to what so it's easy to make missing links, but I 
don't want to do double work here.


What ontologies are important because these are used a lot? Some of the 
ones I came across:

* https://www.w3.org/2009/08/skos-reference/skos.html
* http://xmlns.com/foaf/spec/
* http://schema.org/
* https://creativecommons.org/ns
* http://dbpedia.org/ontology/
* http://vocab.org/open/
Any suggestions?

Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Indexing everything (was Re: Indexing all item properties in ElasticSearch)

2018-08-04 Thread Maarten Dammers

Hi Stas and Hay,


On 28-07-18 02:12, Stas Malyshev wrote:

Hi!


I could definitely see a usecase for 1) and maybe for 2). For example,
let's say i remember that one movie that Rutger Hauer played in, just
searching for 'movie rutger hauer' gives back nothing:

https://www.wikidata.org/w/index.php?search=movie+rutger+hauer

While Wikipedia gives back quite a nice list of options:

https://en.wikipedia.org/w/index.php?search=movie+rutger+hauer

Well, this is not going to change with the work we're discussing. The
reason you don't get anything from Wikidata is because "movie" and
"rutger hauer" are labels from different documents and ElasticSearch
does not do joins. We only index each document in itself, and possibly
some additional data, but indexing labels from other documents is now
beyond what we're doing. We could certainly discuss it but that would be
separate (and much bigger) discussion.
Changing the topic because I would like to start this separate and 
bigger discussion. Query and search are quite similar, but also very 
different (if you search you'll run into nice articles like 
https://everypageispageone.com/2011/07/13/search-vs-query/ ). Currently 
our query service is a very strong and complete service, but Wikidata 
search is very poor. Let's take Blade Runner.

* https://www.wikidata.org/wiki/Q184843 is what a human sees
* http://www.wikidata.org/entity/Q184843.json our internal JSON structure
* http://www.wikidata.org/entity/Q184843.rdf source for the query engine
* https://www.wikidata.org/w/index.php?title=Q184843&action=cirrusdump 
what's indexed in the search engine


In my ideal world, everything I see as a human gets indexed into the 
search engine preferably in a per language index. For example for Dutch 
something like a text_nl field with the, label, description, aliases, 
statements and references in there. So index *everything* and never see 
a Qnumber or Pnumber in there (extra incentive for people to add labels 
in their language). Probably also everything duplicated in the text 
field to fall back to. In this index you would have the "movie Rutger 
Hauer", you would have the cast members ("rolverdeling: Harrison Ford" 
etc.). Yes, this will give a significant increase of index size, but 
will make it much more easier to actually find things.


As for implementation: We already have the logic to serialize our json 
to the RDF format. Maybe also add a serialization format for this that 
is easy to ingest by search engines? I noticed Google having a hard time 
indexing some of our items, see for example 
https://www.google.com/search?q=The+Feast+of+the+Seagods+site%3Awikidata.org&ie=utf-8&oe=utf-8 
. Duck Duck Go seems to be doing a better job 
https://duckduckgo.com/?q=The+Feast+of+the+Seagods+site%3Awikidata.org&t=h_&ia=web 
. Making it easier to index not only for our own search would be a nice 
added benefit.


How feasible is this? Do we already have one or multiple tasks for this 
on Phabricator? Phabricator has gotten a bit unclear when it comes to 
Wikidata search, I think because of misunderstanding between people what 
the goal of the task is. Might be worthwhile spending some time on 
structuring that.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-07-04 Thread Maarten Dammers

Hi Mathieu,

On 04-07-18 11:07, mathieu stumpf guntz wrote:

Hi,

Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :


Regarding attribution, commonly it is assumed that you have to 
respect it transitively. That is one of the reasons a license that 
requires BY sucks so hard for data: unlike with text, the attribution 
requirements grow very quickly. It is the same as with modified 
images and collages: it is not sufficient to attribute the last 
author, but all contributors have to be attributed.
If we want our data to be trustable, then we need traceability. That 
is reporting this chain of sources as extensively as possible, 
whatever the license require or not as attribution. CC-0 allow to 
break this traceability, which make an aweful license to whoever is 
concerned with obtaining reliable data.

A license is not the way to achieve this. We have references for that.


This is why I think that whoever wants to be part of a large 
federation of data on the web, should publish under CC0.
As long as one aim at making a federation of untrustable data banks, 
that's perfect. ;)
So I see you started forum shopping (trying to get the Wikimedia-l 
people in) and making contentious trying to be funny remarks. That's 
usually a good indication a thread is going nowhere.


No, Wikidata is not going to change the CC0. You seem to be the only 
person wanting that and trying to discredit Wikidata will not help you 
in your crusade. I suggest the people who are still interested in this 
to go to https://phabricator.wikimedia.org/T193728 and make useful 
comments over there.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata in the LOD Cloud

2018-06-30 Thread Maarten Dammers
The domain mcc.ae is down for email (see 
https://mxtoolbox.com/domain/mcc.ae/ ) and http://www.mcc.ae/ shows a 
for sale sign. Any idea how to reach the maintainer?


Maarten


On 29-06-18 17:27, David Abián wrote:

I guess Wikidata disappeared from the files yesterday, a few minutes
before 14:00 GMT, when a new version of the cloud was generated. It's
probably a mistake/bug in that generation process.


El 29/06/18 a las 15:13, Maarten Dammers escribió:

Looks like after the last update Wikidata dropped out again?
https://lod-cloud.net/versions/2018-30-05/lod-data.json contains
Wikidata, but in https://lod-cloud.net/lod-data.json it seems to be
currently missing, it does list Wikidata as a target in some other sets.

CCed the maintainer.

Maarten





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata in the LOD Cloud

2018-06-29 Thread Maarten Dammers
Looks like after the last update Wikidata dropped out again? 
https://lod-cloud.net/versions/2018-30-05/lod-data.json contains 
Wikidata, but in https://lod-cloud.net/lod-data.json it seems to be 
currently missing, it does list Wikidata as a target in some other sets.


CCed the maintainer.

Maarten


On 27-06-18 22:26, Maarten Dammers wrote:


Hi Léa and Lucas,

Excellent news! https://lod-cloud.net/dataset/wikidata seems to 
contain the info in a more human readable (and machine readable) way. 
If we add some URI link, does it automagically appear or does Lucas 
has to do some manual work? I assume Lucas has to do some manual work. 
I would suggest you document this somewhere more central so we don't 
have to bother Lucas all the time for updates. Do you already have a 
phabricator task for that?


Maarten


On 11-06-18 17:17, Léa Lacroix wrote:

Hello all,

Thanks to Lucas who filled the necessary requirements, Wikidata now 
appears in the LOD cloud graph: http://lod-cloud.net


Currently, the graph doesn't display all the actual connections of 
Wikidata. The only connections that show up are the properties that 
link to other projects or databases, and having a specific statement 
on them to link to an RDF endpoint.


If you see something missing, you can contribute by adding the 
statement “formatter URI for RDF resource” on properties where the 
resource supports RDF (example 
<https://www.wikidata.org/wiki/Property:P214#P1921>).


You can learn more about the procedure to update the graph and a list 
of the existing and missing datasets here 
<https://www.wikidata.org/wiki/User:Lucas_Werkmeister_%28WMDE%29/LOD_Cloud>, 



Thanks to Lucas and John for making this happening!

--
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de <http://www.wikimedia.de>

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg 
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das 
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata in the LOD Cloud

2018-06-27 Thread Maarten Dammers

Hi Léa and Lucas,

Excellent news! https://lod-cloud.net/dataset/wikidata seems to contain 
the info in a more human readable (and machine readable) way. If we add 
some URI link, does it automagically appear or does Lucas has to do some 
manual work? I assume Lucas has to do some manual work. I would suggest 
you document this somewhere more central so we don't have to bother 
Lucas all the time for updates. Do you already have a phabricator task 
for that?


Maarten


On 11-06-18 17:17, Léa Lacroix wrote:

Hello all,

Thanks to Lucas who filled the necessary requirements, Wikidata now 
appears in the LOD cloud graph: http://lod-cloud.net


Currently, the graph doesn't display all the actual connections of 
Wikidata. The only connections that show up are the properties that 
link to other projects or databases, and having a specific statement 
on them to link to an RDF endpoint.


If you see something missing, you can contribute by adding the 
statement “formatter URI for RDF resource” on properties where the 
resource supports RDF (example 
).


You can learn more about the procedure to update the graph and a list 
of the existing and missing datasets here 
, 



Thanks to Lucas and John for making this happening!

--
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de 

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg 
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das 
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-04 Thread Maarten Dammers
It almost feels like someone doesn’t want Wikidata in there? Maybe that website 
is maintained by DBpedia fans? Just thinking out loud here because DBpedia is 
very popular in the academic world and Wikidata a huge threat for that 
popularity.

Maarten

> Op 4 mei 2018 om 17:20 heeft Denny Vrandečić  het 
> volgende geschreven:
> 
> I'm pretty sure that Wikidata is doing better than 90% of the current bubbles 
> in the diagram.
> 
> If they wanted to have Wikidata in the diagram it would have been there 
> before it was too small to read it. :)
> 
>> On Tue, May 1, 2018 at 7:47 AM Peter F. Patel-Schneider 
>>  wrote:
>> Thanks for the corrections.
>> 
>> So https://www.wikidata.org/entity/Q42 is *the* Wikidata IRI for Douglas
>> Adams.  Retrieving from this IRI results in a 303 See Other to
>> https://www.wikidata.org/wiki/Special:EntityData/Q42, which (I guess) is the
>> main IRI for representations of Douglas Adams and other pages with
>> information about him.
>> 
>> From https://www.wikidata.org/wiki/Special:EntityData/Q42 content
>> negotiation can be used to get the JSON representation (the default), other
>> representations including Turtle, and human-readable information.  (Well
>> actually I'm not sure that this is really correct.  It appears that instead
>> of directly using content negotiation, another 303 See Other is used to
>> provide an IRI for a document in the requested format.)
>> 
>> https://www.wikidata.org/wiki/Special:EntityData/Q42.json and
>> https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl are the useful
>> machine-readable documents containing the Wikidata information about Douglas
>> Adams.  Content negotiation is not possible on these pages.
>> 
>> https://www.wikidata.org/wiki/Q42 is the IRI that produces a human-readable
>> version of the information about Douglas Adams.  Content negotiation is not
>> possible on this page, but it does have link rel="alternate" to the
>> machine-readable pages.
>> 
>> Strangely this page has a link rel="canonical" to itself.  Shouldn't that
>> link be to https://www.wikidata.org/entity/Q42?  There is a human-visible
>> link to this IRI, but there doesn't appear to be any machine-readable link.
>> 
>> RDF links to other IRIs for Douglas Adams are given in RDF pages by
>> properties in the wdtn namespace.  Many, but not all, identifiers are
>> handled this way.  (Strangely ISNI (P213) isn't even though it is linked on
>> the human-readable page.)
>> 
>> So it looks as if Wikidata can be considered as Linked Open Data but maybe
>> some improvements can be made.
>> 
>> 
>> peter
>> 
>> 
>> 
>> On 05/01/2018 01:03 AM, Antoine Zimmermann wrote:
>> > On 01/05/2018 03:25, Peter F. Patel-Schneider wrote:
>> >> As far as I can tell real IRIs for Wikidata are https URIs.  The http IRIs
>> >> redirect to https IRIs.
>> >
>> > That's right.
>> >
>> >>   As far as I can tell no content negotiation is
>> >> done.
>> >
>> > No, you're mistaken. Your tried the URL of a wikipage in your curl command.
>> > Those are for human consumption, thus not available in turtle.
>> >
>> > The "real IRIs" of Wikidata entities are like this:
>> > https://www.wikidata.org/entity/Q{NUMBER}
>> >
>> > However, they 303 redirect to
>> > https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}
>> >
>> > which is the identifier of a schema:Dataset. Then, if you HTTP GET these
>> > URIs, you can content negotiate them to JSON
>> > (https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}.json) or to
>> > turtle (https://www.wikidata.org/wiki/Special:EntityData/Q{NUMBER}.ttl).
>> >
>> >
>> > Suprisingly, there is no connection between the entity IRIs and the 
>> > wikipage
>> > URLs. If one was given the IRI of an entity from Wikidata, and had no
>> > further information about how Wikidata works, they would not be able to
>> > retrieve HTML content about the entity.
>> >
>> >
>> > BTW, I'm not sure the implementation of content negotiation in Wikidata is
>> > correct because the server does not tell me the format of the resource to
>> > which it redirects (as opposed to what DBpedia does, for instance).
>> >
>> >
>> > --AZ
>> 
>> 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata + Wikipedia outreach

2018-01-06 Thread Maarten Dammers

On 05-01-18 22:55, Jane Darnell wrote:
I object to your use of the catalog property to link to something that 
is not a catalog. I don't see why my objection leads you to expect me 
to offer an alternative way to track your project. I am not 
responsible for your project and don't understand what it is. If you 
can't understand that then you should not probably not be editing 
Wikidata.

To add to that. I see three things:
1. Using the wrong property ( catalog (P972) ). Solution -> move to 
another property, this depends on point 3
2. Notability of the people BLT. Solution -> Add more information and 
links to establish notability (or worse case, delete)
3. Using Wikidata as a shopping list for a Wikiproject. Have a 
discussion if we, the Wikidata community,  want that (point 1 might not 
be needed if the end result is don't want)


For the people like Jane and I, you're basically squatting the current 
catalog (P972) property. So we care most about point 1. Point 2 and 3 
are for the BLT community to solve.


Point 3 is probably the hardest one. On 
https://en.wikipedia.org/wiki/Wikipedia:Meetup/Black_Lunch_Table/Lists_of_Articles 
I found the shopping lists for the BLT project. People seem to be in the 
hand curated list and in the Listeria list. Clicking around I found 
https://www.wikidata.org/wiki/Q20011585 which seems to indicate that you 
had a Black Lunch Table meetup on 9 december 2017 at " The 8th Floor" 
and judging from 
https://en.wikipedia.org/wiki/Wikipedia:Meetup/Black_Lunch_Table/Triangle_Jan_2018 
that seems correct. At the bottom of this page is another Listeria 
shopping list based on this. I'm not sure we should store this kind of 
data on Wikidata.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] claim change ignored

2017-10-22 Thread Maarten Dammers

Hi Marco,

Op 21-10-2017 om 14:48 schreef Marco Neumann:

in any event it's a false claim in this example and I will remove the
claim now. 2-2=0 ;)
I undid your edit. You seem to be mixing up father ( 
https://www.wikidata.org/wiki/Q2650401 ) and child ( 
https://www.wikidata.org/wiki/Q15434505). Description also updated to 
make the difference clearer.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Maarten Dammers

Hi Marco,

On 07-09-17 20:51, Marco Fossati wrote:

Hi everyone,

As a data quality addict, I've been investigating the coverage of 
external identifiers linked to Wikidata items about people.


Given the numbers on SQID [1] and some SPARQL queries [2, 3], it seems 
that even the second most used ID (VIAF) only covers *25%* of people 
items circa.

Then, there is a long tail of IDs that are barely used at all.

So here is my question:
*which external identifiers deserve an effort to achieve exhaustive 
coverage?*
I've been doing this for painters. See 
https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Creator_no_authority_control 
and 
https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Creator_missing_collection_authority_control 
.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Script and API module for constraint checks

2017-04-27 Thread Maarten Dammers

Hi Léa,


On 27-04-17 14:47, Léa Lacroix wrote:

Hello all,

In the past few months, the development team has mentored a student, 
Olga, to help us developing a user script 
 
that displays the constraints on the item pages.


To use the script, add the following line to your user/common.js:

mw.loader.load( '//www.wikidata.org/w/index.php?title=User:Jonas_Kress_(WMDE)/check_constraints.js&action=raw&ctype=text/javascript 
' );
Is it a conscious choice not to make a gadget of this or just didn't 
think of it? No messing with javascript. Makes it much easier for user 
to try it.


Maarten
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Sitelink removal in Wikidata

2017-04-26 Thread Maarten Dammers

Hi Amir,


On 26-04-17 05:30, Amir Ladsgroup wrote:

Hey,
One common form of vandalism in Wikidata is removing sitelinks (we 
already have an abuse filter flagging them).

Yes, that seems to happen quite a lot.
One of my friends in Persian Wikipedia (who is not a wikidata editor 
and only cares about Persian Wikipedia) asked me to write a tool that 
lists all Persian Wikipedia sitelink removals. So I wrote something 
small and fast but it's usable for any wiki. For example English 
Wikipedia: 
http://tools.wmflabs.org/dexbot/tools/deleted_sitelinks.php?wiki=enwiki


It's slow due to nature of the database query but once it responds, 
you might find good things to revert.


Since this is the most useful for Wikipedia editors who don't want to 
patrol Wikidata (in that case, this query 
 is 
the most useful) I'm reaching to wider audiences. Sorry for spamming.
Looks the same as 
https://www.wikidata.org/wiki/Wikidata:Database_reports/removed_sitelinks/nlwiki 
to me. I updated 
https://www.wikidata.org/wiki/Wikidata:Database_reports/removed_sitelinks/Configuration 
to also create 
https://www.wikidata.org/wiki/Wikidata:Database_reports/removed_sitelinks/fawiki 
.


Maarten
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Joining May 29, National Plan Open Science meeting in Delft, The Netherlands

2017-04-23 Thread Maarten Dammers

Hi Egon,

Yaroslav is one of our (very active) users/admins/bureaucrats and a 
professor at TU Delft.  Maybe he can join you?


Maarten


On 23-04-17 12:34, Egon Willighagen wrote:

Dear Amit,

you make me painfully aware of something I should have mentioned (my
apologies for haven forgotten that; still tired from the science march
yesterday): it is not primarily international Open Science meeting...
the context is really about how to implement this Dutch Plan Open
Science... so, I was planning to target researchers using Wikidata in
the local Dutch region... other sessions will be about Open Science in
the Dutch funding environment, etc.

That said: there is more information which will get more informative
over the next weeks here:

"Open Science: the National Plan and you" ->
https://www.openscience.nl/nationaal-plan

Your reply also makes me wonder if there is Mozilla Open Science
projects running in NL?

Egon


On Sun, Apr 23, 2017 at 12:04 PM, AMIT KUMAR JAISWAL
 wrote:

Hey Egon,

Thanks for letting us know about this Open Science meeting.

I'm interested in forming a team and currently I'm working with couple
of Open Source projects ranges from Machine Learning/AI, Natural
Language Processing and recently started with Deep Learning.
Apart from this I'm also doing few competitions on Kaggle :
https://www.kaggle.com/amitkumarjaiswal.

Please let me know how can I join/participate in this meeting.

Regards

Thank you
Amit Kumar Jaiswal

On 4/23/17, Egon Willighagen  wrote:

Hi Wikidata community,

on May 29 in Delft, The Netherlands, the first national meeting is
planned for researchers (at a various stage of their career) about the
Dutch National Plan Open Science. There will be a large session where
organisations and individuals can present their experiences with Open
Science...

I will join the meeting and want to see Wikidata there, and plan to
host a table about Wikidata in research... every since the joined
H2020 funding application (which we didn't get), I have been using
Wikidata for our interoperabilty work in various research projects...

However, the more the merrier, and I'm hoping to co-host a Wikidata
table at this meeting... who else is interested in teaming up and
showing the Dutch research community how Wikidata can help them with
their Open Science? My own work is in the area of the life sciences,
but I know many others are using Wikidata for other research fields,
and the meeting is for all research, not just the natural sciences...

Looking forward to hearing from you,

greetings Egon

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: -0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Amit Kumar Jaiswal
Mozilla Representative  | LinkedIn
 | Portfolio

Kanpur, India
Mo. : +91-8081187743 | T : @AMIT_GKP | PGP : EBE7 39F0 0427 4A2C

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata






___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Federation in the Wikidata Query Service

2017-04-01 Thread Maarten Dammers

Hi Gerard,

On 01-04-17 16:15, Gerard Meijssen wrote:

Hoi,
What I fail to understand in this discussion about licenses is what it 
is we achieve by being restrictive.
Same reason why Wikimedia Commons only allows free licenses: Re-users 
can freely use our content without worrying that it contains non-free 
content. It's part of our core values listed at 
https://wikimediafoundation.org/wiki/Terms_of_Use :

You are free to:
* Read and Print our articles and other media free of charge.
* Share and Reuse our articles and other media under free and open licenses.
* Contribute To and Edit our various sites or Projects.

This excludes us from including non-free content in the output of the 
query engine.


Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Federation in the Wikidata Query Service

2017-04-01 Thread Maarten Dammers

Hi Stas,


On 01-04-17 00:54, Stas Malyshev wrote:

Hi!


How about adding an ODbL licensed service? Would it be possible? I am
thinking about SPOI  and their SPARQL endpoint
..

ODBL seems to be in the same vein as CC-BY-SA, so if CC-BY is ok, that
should be OK too. Please add the descriptions to
https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input

Great new feature you have here! I would only add endpoints that use 
free licenses that are compatible with our terms of use ( 
https://wikimediafoundation.org/wiki/Terms_of_Use#7._Licensing_of_Content 
). See http://freedomdefined.org/Definition for a more general 
explanation. This would include ODbL ( 
https://opendatacommons.org/licenses/odbl/summary/ ), but would exclude 
any ND (NoDerivatives) and any NC (NonCommercial) licenses.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New Status Indicator Icon about Relative Page Completeness

2016-11-16 Thread Maarten Dammers

Hi Simon,


On 15-11-16 09:51, Simon Razniewski wrote:

It can be enabled by adding the following line to your /common.js/:
  /importScript( 'User:Ls1g/recoin-core.js' );/
Why don't you turn it into a gadget before promoting it? That would 
lower the barrier a lot for people just wanting to try out your new cool 
tool. Maybe creating a gadget seems to be too complicated or cumbersome? 
An overview of the current gadgets are at 
https://www.wikidata.org/wiki/Special:Gadgets


I was about to create a gadget out of it when I noticed the line 
https://tadaqua.inf.unibz.it/api/getmissingattributes.php in 
https://www.wikidata.org/wiki/User:Ls1g/recoin-core.js . I'm pretty sure 
grabbing data from a third party domain is a violation of the WMF 
privacy policy, because the owner of the domain tadaqua.inf.unibz.it is 
able to track users who enable this script. Not sure. Someone WMF legal 
could probably confirm this. Probably best to move it to 
http://tools.wmflabs.org/ .


Maarten



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Maarten Dammers

Hi Denny,

On 30-09-16 20:47, Denny Vrandečić wrote:
Markus, do you have access to the corresponding HTTP request logs? The 
fields there might be helpful (although I might be overtly optimistic 
about it)
I was about to say the same. I use pywikibot quite a lot and it sends 
some nice headers like described at 
https://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client .


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Status and ETA External ID conversion

2016-03-05 Thread Maarten Dammers

Hi Luca,

Op 5-3-2016 om 16:45 schreef Luca Martinelli:


Point taken, I apologise for using too dramatic tones.
Looks like more people are eager to get this over with and can't wait to 
get everything converted

Nonetheless, I stick to the point that probably a ">99% unique
identifier" threshold is too high. Just to make another example
(disclaimer: I asked for this property since it is yet another
catalogue that my institution runs), P1949 has not been converted to
identifier because it has "only 98.82% unique out of 507 uses", that
translates in only *six* cases out of 505 items which have two P1949
identifiers.
That's correct. As I said in my previous email: We're first doing the 
easy properties. You can see the easy properties at 
https://www.wikidata.org/wiki/User:ArthurPSmith/Identifiers/1 . The easy 
ones are the ones that have 99%+ single value and 99%+ unique. Compare 
that with https://www.wikidata.org/wiki/User:Addshore/Identifiers/1 and 
you'll notice we still have loads of easy ones we have to process (the 
unchecked list is still quite long).


Once we get those out of the way, we'll get to the more difficult ones. 
I prefer quality over speed here. I don't expect any problems with 
converting P1949.


Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Status and ETA External ID conversion

2016-03-05 Thread Maarten Dammers

Hi Luca,

Op 5-3-2016 om 14:30 schreef Luca Martinelli:

Probably the threshold we set up for the conversion is too high, and
this might be one of the causes why the whole process has slowed down
to a dying pace.
You call 
https://www.wikidata.org/wiki/Special:Contributions/Maintenance_script a 
dying pace?


Instead of complaining here people should participate in 
https://www.wikidata.org/wiki/User:Addshore/Identifiers/0 . Still plenty 
of easy properties that are clearly distinct, unique and have an 
external url.
It doesn't make sense to discus the more complicated cases if we haven't 
gotten the easy cases out of the way yet.


Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Place name ambiguation

2015-12-29 Thread Maarten Dammers

Hi Tom,

Op 29-12-2015 om 19:33 schreef Tom Morris:
Thanks Stas & Thomas.  That's unambiguous. :-)  (And thanks to 
Jdforrester who went through and fixed all my examples)
Please keep the long label around as an alias. This really helps when 
you enter data. I wonder if someone ever ran a bot to clean up these 
disambiguations. The US alone must be thousands of items.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-21 Thread Maarten Dammers

Hi Dario,

Op 21-11-2015 om 18:34 schreef Dario Taraborelli:
- shouldn’t a manually unmatched item be created directly on Wikidata 
(after all DBI is all about notable individuals who would easily pass 
Wikidata’s notability threshold for biographies)

If the person in question is notable, you should create an item.
- shouldn’t the relation between /Giulio (Cesare) Baldigara /(Q1010811 
) and the newly created item 
for /Giulio Baldigara/ be explicitly represented via a /not the same 
as/ property, to prevent future humans or machines from accidentally 
remerging the two items based on some kind of heuristics
You can use P1889: "different from" 
(https://www.wikidata.org/wiki/Property:P1889)


Maarten
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Welcome Meta, MediaWiki and Wikispecies

2015-10-23 Thread Maarten Dammers

Op 21-10-2015 om 12:32 schreef Magnus Manske:

Anyone running a bot to integrate Wikispecies pages?
This moved to 
https://www.wikidata.org/wiki/Wikidata:Bot_requests#Wikispecies_sitelinks :-)


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] We plan to drop the wb_entity_per_page table

2015-08-09 Thread Maarten Dammers

Hi Marius,

hoo schreef op 7-8-2015 om 19:55:

Hey folks,

we plan to drop the wb_entity_per_page table sometime soon[0], because
it is just not required (as we will likely always have a programmatic
mapping from entity id to page title) and it does not supported non
-numeric entity ids as it is now.
In the past I was alway told to use the wb_entity_per_page table instead 
of doing page_title=CONCAT('Q', ). The Wikibase code used to contain 
warnings not to make this assumption. I don't know, they might still be 
there.

Due to this removing it is a blocker
for the commons metadata.

That's unfortunate.

Is anybody using that for their tools (on tool labs)? If so, please
tell us so that we can give you instructions and a longer grace period
to update your scripts.
Of the 117 Wikidata related sql queries that seem to be in my homedir, 
48 of them use this table. Basically any Wikidata related tool that uses 
the sql database will break. What do you propose? That we start messing 
around with CONCAT()s in our SQL queries? Besides the hours of wasted 
volunteer time, that's probably a lot slower.


Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mexico / "Building up Wikidata, country by country"

2015-06-15 Thread Maarten Dammers

Andrew Gray schreef op 15-6-2015 om 14:00:

The map can also be used to highlight other country-specific differences,
such as the unusually large amount of orphan items in The Netherlands and
UK.

WLM-related historic site imports, I think...
That's probably the 60.000 Rijksmonumenten (historic sites) and that bot 
run where someone created an item for *every* street in the Netherlands.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata