Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread Andy Mabbett
On 15 June 2015 at 02:54, Thad Guidry  wrote:

> It seems advantageous to somehow tell Wikidata Search that when someone
> types Harvard College to interchange and also look for Harvard University,
> and vice versa.

This is what the "alias" parameter is for.

-- 
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Spam] Re: Mexico / "Building up Wikidata, country by country"

2015-06-15 Thread Andrew Gray
On 10 June 2015 at 16:46, Markus Krötzsch  wrote:

> Another country-based observation is that Italian locations are so much more
> popular than those in almost any other country. Here is a map showing only
> those items with at least 33 (!) sitelinks:
>
> http://wwwpub.zih.tu-dresden.de/~s5219191/vizidata/#d=0&m=items&l=en&f=1&e=33_336&c=48526&g=0.8&h=1.2&o=1&p=3&x=19.599609375&y=47.32393057095941&z=5
>
> Most of them are communes. Even tiny ones like
> https://www.wikidata.org/wiki/Q42251 have articles in 33 different
> Wikipedias. It's a striking difference to basically all other countries.
> Overall, about one third of all items with articles on that many projects
> seem to be located in Italy.

I love the stories that get told through maps like this. Someone has
been hard at work on the regions of Soria (NE Spain), plus Bas-Rhin
and Ain in France. Really sharp borders!

> The map can also be used to highlight other country-specific differences,
> such as the unusually large amount of orphan items in The Netherlands and
> UK.

WLM-related historic site imports, I think...

-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread Thad Guidry
Andy,

I know we have an alias parameter...but...

Do you want to set that alias on 24,000 Universities ?  I don't.
But perhaps a simple backend script could do it...sure.

My point is to all, that having a wiser Wikidata Search seems like a
logical approach, and it doesn't change or skew the intent or meaning of
the data as the rest of you have raised that concern.  Its just a smarter
Search, that is more helpful to folks finding entities and properties.


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 6:14 AM, Andy Mabbett 
wrote:

> On 15 June 2015 at 02:54, Thad Guidry  wrote:
>
> > It seems advantageous to somehow tell Wikidata Search that when someone
> > types Harvard College to interchange and also look for Harvard
> University,
> > and vice versa.
>
> This is what the "alias" parameter is for.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread John Erling Blad
The search is a kind of stupid dialogue system, and it only has a user
model that is sensitive for language. A better dialogue system with an
individual user model could use location as a hint for context. There
are several heavy books about that topic!

If I search for "Oslo" and live in Norway it is highly likely that I
want the article about the city in Norway. If I live in Marshall
County, Minnesota, it is not so obvious that I want the city in Norway
to be ranked first. But what if I live in Norway and have just
searched for Marshall County? It is not easy to get these things
right, and it is a lot more difficult than just adding some aliases.
The aliases can solve the alternate label problem, but it can not
solve the user context problem.

On Mon, Jun 15, 2015 at 4:41 PM, Thad Guidry  wrote:
> Andy,
>
> I know we have an alias parameter...but...
>
> Do you want to set that alias on 24,000 Universities ?  I don't.
> But perhaps a simple backend script could do it...sure.
>
> My point is to all, that having a wiser Wikidata Search seems like a logical
> approach, and it doesn't change or skew the intent or meaning of the data as
> the rest of you have raised that concern.  Its just a smarter Search, that
> is more helpful to folks finding entities and properties.
>
>
> Thad
> +ThadGuidry
>
> On Mon, Jun 15, 2015 at 6:14 AM, Andy Mabbett 
> wrote:
>>
>> On 15 June 2015 at 02:54, Thad Guidry  wrote:
>>
>> > It seems advantageous to somehow tell Wikidata Search that when someone
>> > types Harvard College to interchange and also look for Harvard
>> > University,
>> > and vice versa.
>>
>> This is what the "alias" parameter is for.
>>
>> --
>> Andy Mabbett
>> @pigsonthewing
>> http://pigsonthewing.org.uk
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread John Erling Blad
Perhaps someone does not know, .. It is a city called Oslo in Marshall
County, Minnesota. ;)

On Mon, Jun 15, 2015 at 5:00 PM, John Erling Blad  wrote:
> The search is a kind of stupid dialogue system, and it only has a user
> model that is sensitive for language. A better dialogue system with an
> individual user model could use location as a hint for context. There
> are several heavy books about that topic!
>
> If I search for "Oslo" and live in Norway it is highly likely that I
> want the article about the city in Norway. If I live in Marshall
> County, Minnesota, it is not so obvious that I want the city in Norway
> to be ranked first. But what if I live in Norway and have just
> searched for Marshall County? It is not easy to get these things
> right, and it is a lot more difficult than just adding some aliases.
> The aliases can solve the alternate label problem, but it can not
> solve the user context problem.
>
> On Mon, Jun 15, 2015 at 4:41 PM, Thad Guidry  wrote:
>> Andy,
>>
>> I know we have an alias parameter...but...
>>
>> Do you want to set that alias on 24,000 Universities ?  I don't.
>> But perhaps a simple backend script could do it...sure.
>>
>> My point is to all, that having a wiser Wikidata Search seems like a logical
>> approach, and it doesn't change or skew the intent or meaning of the data as
>> the rest of you have raised that concern.  Its just a smarter Search, that
>> is more helpful to folks finding entities and properties.
>>
>>
>> Thad
>> +ThadGuidry
>>
>> On Mon, Jun 15, 2015 at 6:14 AM, Andy Mabbett 
>> wrote:
>>>
>>> On 15 June 2015 at 02:54, Thad Guidry  wrote:
>>>
>>> > It seems advantageous to somehow tell Wikidata Search that when someone
>>> > types Harvard College to interchange and also look for Harvard
>>> > University,
>>> > and vice versa.
>>>
>>> This is what the "alias" parameter is for.
>>>
>>> --
>>> Andy Mabbett
>>> @pigsonthewing
>>> http://pigsonthewing.org.uk
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
In Freebase, we had bot scripts that went through and removed "Lists of
Things" topic entities since they are lists of entities and not useful
clumped together and normalized in a graph database.

Does Wikidata have something similar or a user review process for deletion
of these ?

Ex. List of tallest buildings in Wuhan -
https://www.wikidata.org/wiki/Q6642364

Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread Thad Guidry
I don't want to solve problems that you describe John.

I just want to the Wikidata Search to be a bit smarter in regards to common
labels are the sometimes interchanged in various languages.  It's something
we solved in Freebase and can be done.  (We generated special Lucene
indexes, back in the day).


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 10:03 AM, John Erling Blad  wrote:

> Perhaps someone does not know, .. It is a city called Oslo in Marshall
> County, Minnesota. ;)
>
> On Mon, Jun 15, 2015 at 5:00 PM, John Erling Blad 
> wrote:
> > The search is a kind of stupid dialogue system, and it only has a user
> > model that is sensitive for language. A better dialogue system with an
> > individual user model could use location as a hint for context. There
> > are several heavy books about that topic!
> >
> > If I search for "Oslo" and live in Norway it is highly likely that I
> > want the article about the city in Norway. If I live in Marshall
> > County, Minnesota, it is not so obvious that I want the city in Norway
> > to be ranked first. But what if I live in Norway and have just
> > searched for Marshall County? It is not easy to get these things
> > right, and it is a lot more difficult than just adding some aliases.
> > The aliases can solve the alternate label problem, but it can not
> > solve the user context problem.
> >
> > On Mon, Jun 15, 2015 at 4:41 PM, Thad Guidry 
> wrote:
> >> Andy,
> >>
> >> I know we have an alias parameter...but...
> >>
> >> Do you want to set that alias on 24,000 Universities ?  I don't.
> >> But perhaps a simple backend script could do it...sure.
> >>
> >> My point is to all, that having a wiser Wikidata Search seems like a
> logical
> >> approach, and it doesn't change or skew the intent or meaning of the
> data as
> >> the rest of you have raised that concern.  Its just a smarter Search,
> that
> >> is more helpful to folks finding entities and properties.
> >>
> >>
> >> Thad
> >> +ThadGuidry
> >>
> >> On Mon, Jun 15, 2015 at 6:14 AM, Andy Mabbett <
> a...@pigsonthewing.org.uk>
> >> wrote:
> >>>
> >>> On 15 June 2015 at 02:54, Thad Guidry  wrote:
> >>>
> >>> > It seems advantageous to somehow tell Wikidata Search that when
> someone
> >>> > types Harvard College to interchange and also look for Harvard
> >>> > University,
> >>> > and vice versa.
> >>>
> >>> This is what the "alias" parameter is for.
> >>>
> >>> --
> >>> Andy Mabbett
> >>> @pigsonthewing
> >>> http://pigsonthewing.org.uk
> >>>
> >>> ___
> >>> Wikidata mailing list
> >>> Wikidata@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >>
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Sjoerd de Bruin
Why should they be deleted? Have you looked at our notability policy?

Greetings,

Sjoerd de Bruin
sjoerddebr...@me.com

> Op 15 jun. 2015, om 17:21 heeft Thad Guidry  het 
> volgende geschreven:
> 
> In Freebase, we had bot scripts that went through and removed "Lists of 
> Things" topic entities since they are lists of entities and not useful 
> clumped together and normalized in a graph database.
> 
> Does Wikidata have something similar or a user review process for deletion of 
> these ?
> 
> Ex. List of tallest buildings in Wuhan -   
> https://www.wikidata.org/wiki/Q6642364 
> 
> 
> Thad
> +ThadGuidry 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread Federico Leva (Nemo)

John Erling Blad, 15/06/2015 17:00:

If I search for "Oslo" and live in Norway it is highly likely that I
want the article about the city in Norway. If I live in Marshall
County, Minnesota, it is not so obvious that I want the city in Norway
to be ranked first.


If Chinese really create a city call "Parma" to sell more prosciutto, I 
want Chinese users to be given the real Parma first always. :)


Nemo

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Federico Leva (Nemo)
By this reasoning we should also delete items about categories or 
disambiguation pages.


Thad Guidry, 15/06/2015 17:21:

Ex. List of tallest buildings in Wuhan -
https://www.wikidata.org/wiki/Q6642364


What's the issue here? The item doesn't actually contain any list, there 
is no duplication or information "clumped together".


Nemo

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Andrew Gray
Hi Thad,

These are in scope for Wikidata and so should be retained - as there
are Wikipedia articles on those topics, we need to use the entries in
Wikidata in order to provide cross-language functionality for those
articles.

However, if you have concerns about them getting mixed in with 'real'
entities, filtering out any entry with P31:Q13406463 should omit most
of them from your results.

(There is a related grey area in that many "Lists of X", say for
office-holders or prize-winners, often map directly to another
language's entry on the office or prize; what is the Wikidata item
really "about"? But that's something that hopefully will resolve
itself over time.)

Andrew.

On 15 June 2015 at 16:21, Thad Guidry  wrote:
> In Freebase, we had bot scripts that went through and removed "Lists of
> Things" topic entities since they are lists of entities and not useful
> clumped together and normalized in a graph database.
>
> Does Wikidata have something similar or a user review process for deletion
> of these ?
>
> Ex. List of tallest buildings in Wuhan -
> https://www.wikidata.org/wiki/Q6642364
>
> Thad
> +ThadGuidry
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Benjamin Good
This is an important question.  There are apparently 196,839 known list
items based on a query for instanceOf Wikipedia list item
(CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D

I tend to agree with Thad that these kinds of items aren't really what we
want filling in WikiData.  In fact replacing them with the ability to
generate them automatically based on queries is a primary use case for
wikidata.  But just deleting them doesn't entirely make sense either
because they are key signposts into things that ought to be brought into
wikidata properly.  The items in these lists clearly matter..

Ideally we could generate a bot that would examine each of these lists and
identify the unifying properties that should be added to the items within
the list that would enable the list to be reproduced by a query.

I disagree that this reasoning suggests deleting items about categories and
disambiguation pages. - both of these clearly have functions in wikidata.
I'm not sure what the function of a list entity is.


On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) 
wrote:

> By this reasoning we should also delete items about categories or
> disambiguation pages.
>
> Thad Guidry, 15/06/2015 17:21:
>
>> Ex. List of tallest buildings in Wuhan -
>> https://www.wikidata.org/wiki/Q6642364
>>
>
> What's the issue here? The item doesn't actually contain any list, there
> is no duplication or information "clumped together".
>
> Nemo
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
Benjamin has the right idea... and we did similar in Freebase in handling
that same way... sometimes it was a manual labor of love... most of the
time, we just deleted them and hoped that Wikipedia would make them real
topic entities later on for us to properly absorb.

How Wikidata decided to handle, I don't care...if you keep them around,
then just give users a way to filter them out in your API's is all that I
ask. :)


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good 
wrote:

> This is an important question.  There are apparently 196,839 known list
> items based on a query for instanceOf Wikipedia list item
> (CLAIM[31:13406463])
> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>
> I tend to agree with Thad that these kinds of items aren't really what we
> want filling in WikiData.  In fact replacing them with the ability to
> generate them automatically based on queries is a primary use case for
> wikidata.  But just deleting them doesn't entirely make sense either
> because they are key signposts into things that ought to be brought into
> wikidata properly.  The items in these lists clearly matter..
>
> Ideally we could generate a bot that would examine each of these lists and
> identify the unifying properties that should be added to the items within
> the list that would enable the list to be reproduced by a query.
>
> I disagree that this reasoning suggests deleting items about categories
> and disambiguation pages. - both of these clearly have functions in
> wikidata.  I'm not sure what the function of a list entity is.
>
>
> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) 
> wrote:
>
>> By this reasoning we should also delete items about categories or
>> disambiguation pages.
>>
>> Thad Guidry, 15/06/2015 17:21:
>>
>>> Ex. List of tallest buildings in Wuhan -
>>> https://www.wikidata.org/wiki/Q6642364
>>>
>>
>> What's the issue here? The item doesn't actually contain any list, there
>> is no duplication or information "clumped together".
>>
>> Nemo
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread John Erling Blad
There are a lot of places called Parma, and it is not obvious which
one should be listed first. Perhaps Parma in Tibet?

This is actually a problem that can't be easily solved. "Parma" is
interpreted in a cultural context, and the Italian city is just one of
several places called the same. It might be obvious for an Italian
that "Parma" is an Italian city, but it is equally obvious for someone
from Tibet? What about Parma, Ohio, more than 80.000 people live
there?

The solution is to use a set of standardized user models, typically
they will follow the language regions.

On Mon, Jun 15, 2015 at 5:45 PM, Federico Leva (Nemo)
 wrote:
> John Erling Blad, 15/06/2015 17:00:
>>
>> If I search for "Oslo" and live in Norway it is highly likely that I
>> want the article about the city in Norway. If I live in Marshall
>> County, Minnesota, it is not so obvious that I want the city in Norway
>> to be ranked first.
>
>
> If Chinese really create a city call "Parma" to sell more prosciutto, I want
> Chinese users to be given the real Parma first always. :)
>
> Nemo
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
In General,

I think Wikidata needs to decide going forward if it will be a strict
Entity Graph...or if it will be a Big Graph of all things Wikipedia.
Its an important question...if it decides on the latter...then just give a
way to filter out non-entities for the API and Search users.


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry  wrote:

> Benjamin has the right idea... and we did similar in Freebase in handling
> that same way... sometimes it was a manual labor of love... most of the
> time, we just deleted them and hoped that Wikipedia would make them real
> topic entities later on for us to properly absorb.
>
> How Wikidata decided to handle, I don't care...if you keep them around,
> then just give users a way to filter them out in your API's is all that I
> ask. :)
>
>
> Thad
> +ThadGuidry 
>
> On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good 
> wrote:
>
>> This is an important question.  There are apparently 196,839 known list
>> items based on a query for instanceOf Wikipedia list item
>> (CLAIM[31:13406463])
>>
>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>
>> I tend to agree with Thad that these kinds of items aren't really what we
>> want filling in WikiData.  In fact replacing them with the ability to
>> generate them automatically based on queries is a primary use case for
>> wikidata.  But just deleting them doesn't entirely make sense either
>> because they are key signposts into things that ought to be brought into
>> wikidata properly.  The items in these lists clearly matter..
>>
>> Ideally we could generate a bot that would examine each of these lists
>> and identify the unifying properties that should be added to the items
>> within the list that would enable the list to be reproduced by a query.
>>
>> I disagree that this reasoning suggests deleting items about categories
>> and disambiguation pages. - both of these clearly have functions in
>> wikidata.  I'm not sure what the function of a list entity is.
>>
>>
>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) > > wrote:
>>
>>> By this reasoning we should also delete items about categories or
>>> disambiguation pages.
>>>
>>> Thad Guidry, 15/06/2015 17:21:
>>>
 Ex. List of tallest buildings in Wuhan -
 https://www.wikidata.org/wiki/Q6642364

>>>
>>> What's the issue here? The item doesn't actually contain any list, there
>>> is no duplication or information "clumped together".
>>>
>>> Nemo
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Romaine Wiki
Also the list entity has a function. The function of *instance of* is to
identify what a page is about. A database is built on consistency, the list
entity does do that for lists. A list is a very special type of a subject
in comparison to other articles. It isn't linked through topic type
properties. By using a list entity this kind of items are identified as
such. Likewise for dps, categories, templates, etc.

Romaine

2015-06-15 17:53 GMT+02:00 Benjamin Good :

> This is an important question.  There are apparently 196,839 known list
> items based on a query for instanceOf Wikipedia list item
> (CLAIM[31:13406463])
> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>
> I tend to agree with Thad that these kinds of items aren't really what we
> want filling in WikiData.  In fact replacing them with the ability to
> generate them automatically based on queries is a primary use case for
> wikidata.  But just deleting them doesn't entirely make sense either
> because they are key signposts into things that ought to be brought into
> wikidata properly.  The items in these lists clearly matter..
>
> Ideally we could generate a bot that would examine each of these lists and
> identify the unifying properties that should be added to the items within
> the list that would enable the list to be reproduced by a query.
>
> I disagree that this reasoning suggests deleting items about categories
> and disambiguation pages. - both of these clearly have functions in
> wikidata.  I'm not sure what the function of a list entity is.
>
>
> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) 
> wrote:
>
>> By this reasoning we should also delete items about categories or
>> disambiguation pages.
>>
>> Thad Guidry, 15/06/2015 17:21:
>>
>>> Ex. List of tallest buildings in Wuhan -
>>> https://www.wikidata.org/wiki/Q6642364
>>>
>>
>> What's the issue here? The item doesn't actually contain any list, there
>> is no duplication or information "clumped together".
>>
>> Nemo
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mexico / "Building up Wikidata, country by country"

2015-06-15 Thread Maarten Dammers

Andrew Gray schreef op 15-6-2015 om 14:00:

The map can also be used to highlight other country-specific differences,
such as the unusually large amount of orphan items in The Netherlands and
UK.

WLM-related historic site imports, I think...
That's probably the 60.000 Rijksmonumenten (historic sites) and that bot 
run where someone created an item for *every* street in the Netherlands.


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Benjamin Good
I think this is clearly an evolutionary process.  In the short term,
wikidata needs to support Wikipedia use cases as Andrew mentioned above
(thank you for the clarification).  In the long term, this function and all
other functions will (in my opinion) best be served by a transition into
more and more of an entity graph where claims are made about things in the
world rather than about constructs in a database.  Perhaps there is some
form of the WikiData game that could be generated to support this process
for lists.

The intervening period is going to be a challenge in terms of modeling and
in application-level hiding of weird ontological situations where objects
are being described like (item1: instanceOf, WikipediaList AND item1:
subclassOf moons of jupiter), but there is no way around it.  And its 100%
worthwhile to do whatever it takes to keep things integrated with Wikipedia
and to further establish wikidata as indispensable there.

-Ben






On Mon, Jun 15, 2015 at 9:11 AM, Thad Guidry  wrote:

> In General,
>
> I think Wikidata needs to decide going forward if it will be a strict
> Entity Graph...or if it will be a Big Graph of all things Wikipedia.
> Its an important question...if it decides on the latter...then just give a
> way to filter out non-entities for the API and Search users.
>
>
> Thad
> +ThadGuidry 
>
> On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry 
> wrote:
>
>> Benjamin has the right idea... and we did similar in Freebase in handling
>> that same way... sometimes it was a manual labor of love... most of the
>> time, we just deleted them and hoped that Wikipedia would make them real
>> topic entities later on for us to properly absorb.
>>
>> How Wikidata decided to handle, I don't care...if you keep them around,
>> then just give users a way to filter them out in your API's is all that I
>> ask. :)
>>
>>
>> Thad
>> +ThadGuidry 
>>
>> On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good > > wrote:
>>
>>> This is an important question.  There are apparently 196,839 known list
>>> items based on a query for instanceOf Wikipedia list item
>>> (CLAIM[31:13406463])
>>>
>>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>>
>>> I tend to agree with Thad that these kinds of items aren't really what
>>> we want filling in WikiData.  In fact replacing them with the ability to
>>> generate them automatically based on queries is a primary use case for
>>> wikidata.  But just deleting them doesn't entirely make sense either
>>> because they are key signposts into things that ought to be brought into
>>> wikidata properly.  The items in these lists clearly matter..
>>>
>>> Ideally we could generate a bot that would examine each of these lists
>>> and identify the unifying properties that should be added to the items
>>> within the list that would enable the list to be reproduced by a query.
>>>
>>> I disagree that this reasoning suggests deleting items about categories
>>> and disambiguation pages. - both of these clearly have functions in
>>> wikidata.  I'm not sure what the function of a list entity is.
>>>
>>>
>>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <
>>> nemow...@gmail.com> wrote:
>>>
 By this reasoning we should also delete items about categories or
 disambiguation pages.

 Thad Guidry, 15/06/2015 17:21:

> Ex. List of tallest buildings in Wuhan -
> https://www.wikidata.org/wiki/Q6642364
>

 What's the issue here? The item doesn't actually contain any list,
 there is no duplication or information "clumped together".

 Nemo


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Daniel Kinzler
Am 15.06.2015 um 18:11 schrieb Thad Guidry:
> In General,
> 
> I think Wikidata needs to decide going forward if it will be a strict Entity
> Graph...or if it will be a Big Graph of all things Wikipedia.
> Its an important question...if it decides on the latter...then just give a way
> to filter out non-entities for the API and Search users.

I think there is a misunderstanding here. For practical reasons, Wikidata allows
items about Wikipedia *pages*. Items that refer to Wikipedia list pages, or
categories, or disambiguation pages, or policy pages, etc, are useful for
managing these pages. They are conceptually different from items about "real"
things.

I agree that Wikidata should not have items that *model* lists. But it can have
items about list *pages* on Wikipedia.

That being said, I would love to be able to have a clear distinction between
items about pages, and "real" items. To an extent, this is done via instanceof
statements, e.g. instanceof -> Wikimedia Disambiguation Page. But it would be
nice to haver an easier way to filter those out in contexts where they are not
relevant.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Federico Leva (Nemo)

Thad Guidry, 15/06/2015 18:11:

I think Wikidata needs to decide going forward if it will be a strict
Entity Graph...or if it will be a Big Graph of all things Wikipedia.


I understand the question, but why are the two things in contradiction?

Nemo

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
Federico,

As a Data Architect, I only care about individual Entities.  I do not care
what Wikidata needs internal for coordination with Wikipedia, etc...

There is no contradiction...as long as Wikdata provides a good mechanism
for me to filter out non-entities.  Ideally an API or Search parameter that
says "give me only 'real' items / entites".


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 12:31 PM, Federico Leva (Nemo) 
wrote:

> Thad Guidry, 15/06/2015 18:11:
>
>> I think Wikidata needs to decide going forward if it will be a strict
>> Entity Graph...or if it will be a Big Graph of all things Wikipedia.
>>
>
> I understand the question, but why are the two things in contradiction?
>
>
> Nemo
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Daniel Kinzler
Am 15.06.2015 um 20:09 schrieb Thad Guidry:
> Federico,
> 
> As a Data Architect, I only care about individual Entities.  I do not care 
> what
> Wikidata needs internal for coordination with Wikipedia, etc... 
> 
> There is no contradiction...as long as Wikdata provides a good mechanism for 
> me
> to filter out non-entities.  Ideally an API or Search parameter that says 
> "give
> me only 'real' items / entites".

I would also like that, for convenience.

But conceptually, Wikipedia pages are things in the world, and are "real" in
that sense. So if we don't want to introduce a nasty hack into the data model,
you'd have to do this by saying "exclude everything that is an instance of
MediaWiki page (Q15474042)". That's a rather expensive operation...

Can you think of a way to do this nicely, that doesn't need special case hacks
in the software?


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Gerard Meijssen
Hoi,
I have been REALLY active in adding statements with "is a list of" They do
have a function. They show the content of a list in Reasonator.
I do appreciate it when they are retained. They are both lists and
categories.
THanks,
 GerardM

On 15 June 2015 at 17:53, Benjamin Good  wrote:

> This is an important question.  There are apparently 196,839 known list
> items based on a query for instanceOf Wikipedia list item
> (CLAIM[31:13406463])
> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>
> I tend to agree with Thad that these kinds of items aren't really what we
> want filling in WikiData.  In fact replacing them with the ability to
> generate them automatically based on queries is a primary use case for
> wikidata.  But just deleting them doesn't entirely make sense either
> because they are key signposts into things that ought to be brought into
> wikidata properly.  The items in these lists clearly matter..
>
> Ideally we could generate a bot that would examine each of these lists and
> identify the unifying properties that should be added to the items within
> the list that would enable the list to be reproduced by a query.
>
> I disagree that this reasoning suggests deleting items about categories
> and disambiguation pages. - both of these clearly have functions in
> wikidata.  I'm not sure what the function of a list entity is.
>
>
> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) 
> wrote:
>
>> By this reasoning we should also delete items about categories or
>> disambiguation pages.
>>
>> Thad Guidry, 15/06/2015 17:21:
>>
>>> Ex. List of tallest buildings in Wuhan -
>>> https://www.wikidata.org/wiki/Q6642364
>>>
>>
>> What's the issue here? The item doesn't actually contain any list, there
>> is no duplication or information "clumped together".
>>
>> Nemo
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Gerard Meijssen
Hoi,
Wikidata is not Freebase and no thanks.
Thanks,
  GerardM

On 15 June 2015 at 18:07, Thad Guidry  wrote:

> Benjamin has the right idea... and we did similar in Freebase in handling
> that same way... sometimes it was a manual labor of love... most of the
> time, we just deleted them and hoped that Wikipedia would make them real
> topic entities later on for us to properly absorb.
>
> How Wikidata decided to handle, I don't care...if you keep them around,
> then just give users a way to filter them out in your API's is all that I
> ask. :)
>
>
> Thad
> +ThadGuidry 
>
> On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good 
> wrote:
>
>> This is an important question.  There are apparently 196,839 known list
>> items based on a query for instanceOf Wikipedia list item
>> (CLAIM[31:13406463])
>>
>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>
>> I tend to agree with Thad that these kinds of items aren't really what we
>> want filling in WikiData.  In fact replacing them with the ability to
>> generate them automatically based on queries is a primary use case for
>> wikidata.  But just deleting them doesn't entirely make sense either
>> because they are key signposts into things that ought to be brought into
>> wikidata properly.  The items in these lists clearly matter..
>>
>> Ideally we could generate a bot that would examine each of these lists
>> and identify the unifying properties that should be added to the items
>> within the list that would enable the list to be reproduced by a query.
>>
>> I disagree that this reasoning suggests deleting items about categories
>> and disambiguation pages. - both of these clearly have functions in
>> wikidata.  I'm not sure what the function of a list entity is.
>>
>>
>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) > > wrote:
>>
>>> By this reasoning we should also delete items about categories or
>>> disambiguation pages.
>>>
>>> Thad Guidry, 15/06/2015 17:21:
>>>
 Ex. List of tallest buildings in Wuhan -
 https://www.wikidata.org/wiki/Q6642364

>>>
>>> What's the issue here? The item doesn't actually contain any list, there
>>> is no duplication or information "clumped together".
>>>
>>> Nemo
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Stas Malyshev
Hi!

> In Freebase, we had bot scripts that went through and removed "Lists of
> Things" topic entities since they are lists of entities and not useful
> clumped together and normalized in a graph database.

Why delete them? Wikidata has a number of things which are not your
standard "entity" - lists, sources, news, quotes, service entries,
narrative articles (e.g.
https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not
exactly "entity" like "human" or "fire"), etc. So I don't think the
approach that singles out and excludes lists would help much - if you
have an application that needs "individual entities" like "Douglas
Adams" or "London" and exclude other types will have to exclude much
more than just lists - but I think the approach of asking for exactly
what you need and ignoring the rest may prove more efficient. I'm not
sure there's really well-defined criteria to specify what "individual
entity" actually is - I'm sure you have one that matches your
application, but some other application may have completely different one.
Generally, this can be solved by better classification I think, but so
far I'm not sure what to base this classification on.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thomas Douillard
We can create as many specialized classes as we want. That lists are more
specific than classes is not a fatality.

I think having a list about instances of a concept proves the concept is
useful, so that the class is something that could exists. Moreother if we
manually mark an item as an instance of such a class, in only one statement
we add a lot of informations and maybe a few properties could be
automatically added by a bot.

2015-06-15 18:09 GMT+02:00 Romaine Wiki :

> Also the list entity has a function. The function of *instance of* is to
> identify what a page is about. A database is built on consistency, the list
> entity does do that for lists. A list is a very special type of a subject
> in comparison to other articles. It isn't linked through topic type
> properties. By using a list entity this kind of items are identified as
> such. Likewise for dps, categories, templates, etc.
>
> Romaine
>
> 2015-06-15 17:53 GMT+02:00 Benjamin Good :
>
>> This is an important question.  There are apparently 196,839 known list
>> items based on a query for instanceOf Wikipedia list item
>> (CLAIM[31:13406463])
>>
>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>
>> I tend to agree with Thad that these kinds of items aren't really what we
>> want filling in WikiData.  In fact replacing them with the ability to
>> generate them automatically based on queries is a primary use case for
>> wikidata.  But just deleting them doesn't entirely make sense either
>> because they are key signposts into things that ought to be brought into
>> wikidata properly.  The items in these lists clearly matter..
>>
>> Ideally we could generate a bot that would examine each of these lists
>> and identify the unifying properties that should be added to the items
>> within the list that would enable the list to be reproduced by a query.
>>
>> I disagree that this reasoning suggests deleting items about categories
>> and disambiguation pages. - both of these clearly have functions in
>> wikidata.  I'm not sure what the function of a list entity is.
>>
>>
>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) > > wrote:
>>
>>> By this reasoning we should also delete items about categories or
>>> disambiguation pages.
>>>
>>> Thad Guidry, 15/06/2015 17:21:
>>>
 Ex. List of tallest buildings in Wuhan -
 https://www.wikidata.org/wiki/Q6642364

>>>
>>> What's the issue here? The item doesn't actually contain any list, there
>>> is no duplication or information "clumped together".
>>>
>>> Nemo
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thomas Douillard
We already does, all Wikimedia entities are marked as instance of
''Wikidata entity'' (or Wikidata page) or any subclass of them.

Plus usually the current properties does not applies to them, just
dedicated properties, so it's unlikely we find one of them by mistake

(execpt for list items as they are a mess because they essentially are
classes and are sometimes merged with actual entities, so it's a mess).


2015-06-15 18:11 GMT+02:00 Thad Guidry :

> In General,
>
> I think Wikidata needs to decide going forward if it will be a strict
> Entity Graph...or if it will be a Big Graph of all things Wikipedia.
> Its an important question...if it decides on the latter...then just give a
> way to filter out non-entities for the API and Search users.
>
>
> Thad
> +ThadGuidry 
>
> On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry 
> wrote:
>
>> Benjamin has the right idea... and we did similar in Freebase in handling
>> that same way... sometimes it was a manual labor of love... most of the
>> time, we just deleted them and hoped that Wikipedia would make them real
>> topic entities later on for us to properly absorb.
>>
>> How Wikidata decided to handle, I don't care...if you keep them around,
>> then just give users a way to filter them out in your API's is all that I
>> ask. :)
>>
>>
>> Thad
>> +ThadGuidry 
>>
>> On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good > > wrote:
>>
>>> This is an important question.  There are apparently 196,839 known list
>>> items based on a query for instanceOf Wikipedia list item
>>> (CLAIM[31:13406463])
>>>
>>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>>
>>> I tend to agree with Thad that these kinds of items aren't really what
>>> we want filling in WikiData.  In fact replacing them with the ability to
>>> generate them automatically based on queries is a primary use case for
>>> wikidata.  But just deleting them doesn't entirely make sense either
>>> because they are key signposts into things that ought to be brought into
>>> wikidata properly.  The items in these lists clearly matter..
>>>
>>> Ideally we could generate a bot that would examine each of these lists
>>> and identify the unifying properties that should be added to the items
>>> within the list that would enable the list to be reproduced by a query.
>>>
>>> I disagree that this reasoning suggests deleting items about categories
>>> and disambiguation pages. - both of these clearly have functions in
>>> wikidata.  I'm not sure what the function of a list entity is.
>>>
>>>
>>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <
>>> nemow...@gmail.com> wrote:
>>>
 By this reasoning we should also delete items about categories or
 disambiguation pages.

 Thad Guidry, 15/06/2015 17:21:

> Ex. List of tallest buildings in Wuhan -
> https://www.wikidata.org/wiki/Q6642364
>

 What's the issue here? The item doesn't actually contain any list,
 there is no duplication or information "clumped together".

 Nemo


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
Stas,

Always agreed, it's a classification problem.

So what claims/statements do I rule out ?  Or what should I only rule in
(claims/statements) when wanting to return only "real" entities ?  Can
someone help with those negative claims/statements that I am looking for ?
So far, I only have got

​1. ​
 filtering out any entry with P31:Q13406463 should omit most
​ ​
of them from your results.

All,

Freebase simply decided to not keep Wikipedia topic pages that simply held
"lists of entities", but instead Freebase liked to easily generate those
same "lists of entities" by using queries.  There was no need to have hand
coded lists in Freebase.  It was a graph database and could generate all
kinds of lists programmaticlly for a user, and keep those lists as views
against our user profile for easy tweaking or re-use when we wanted to.
(stored user queries)

Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 2:56 PM, Stas Malyshev 
wrote:

> Hi!
>
> > In Freebase, we had bot scripts that went through and removed "Lists of
> > Things" topic entities since they are lists of entities and not useful
> > clumped together and normalized in a graph database.
>
> Why delete them? Wikidata has a number of things which are not your
> standard "entity" - lists, sources, news, quotes, service entries,
> narrative articles (e.g.
> https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not
> exactly "entity" like "human" or "fire"), etc. So I don't think the
> approach that singles out and excludes lists would help much - if you
> have an application that needs "individual entities" like "Douglas
> Adams" or "London" and exclude other types will have to exclude much
> more than just lists - but I think the approach of asking for exactly
> what you need and ignoring the rest may prove more efficient. I'm not
> sure there's really well-defined criteria to specify what "individual
> entity" actually is - I'm sure you have one that matches your
> application, but some other application may have completely different one.
> Generally, this can be solved by better classification I think, but so
> far I'm not sure what to base this classification on.
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Help needed for Freebase to Wikidata migration

2015-06-15 Thread Thomas Pellissier-Tanon
Hey everyone,

As you may already know, I am currently working on the importation of
Freebase content into Wikidata [1] using the primary source tool [2].

One of the big challenges of the migration is to build a good mapping of
the properties of Freebase to Wikidata ones.There are a few thousand of
properties so it is a task too big to be done alone. Your help is far more
than welcome for this task on this page:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping

Cheers,

Thomas

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
[2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Stas Malyshev
Hi!

> So what claims/statements do I rule out ?  Or what should I only rule in
> (claims/statements) when wanting to return only "real" entities ?  Can
> someone help with those negative claims/statements that I am looking for ?
> So far, I only have got
> 
> ​1. ​
>  filtering out any entry with P31:Q13406463 should omit most
> ​ ​
> of them from your results. 

I guess it's somewhat depends on what you call "real". Unfortunately,
not all items are even classified - e.g. random example:
https://www.wikidata.org/wiki/Q16515271
this is wikiquote-only page, but it doesn't have any markers to say so.
So with this one, I see no easy way to exclude it.
OTOH, there are things like https://www.wikidata.org/wiki/Q17442446 or
https://www.wikidata.org/wiki/Q17379835 - probably items in their
hierarchy may be candidates for exclusion.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Thad Guidry
Thanks Stas,

Those are useful.


Thad
+ThadGuidry 

On Mon, Jun 15, 2015 at 7:46 PM, Stas Malyshev 
wrote:

> Hi!
>
> > So what claims/statements do I rule out ?  Or what should I only rule in
> > (claims/statements) when wanting to return only "real" entities ?  Can
> > someone help with those negative claims/statements that I am looking for
> ?
> > So far, I only have got
> >
> > ​1. ​
> >  filtering out any entry with P31:Q13406463 should omit most
> > ​ ​
> > of them from your results.
>
> I guess it's somewhat depends on what you call "real". Unfortunately,
> not all items are even classified - e.g. random example:
> https://www.wikidata.org/wiki/Q16515271
> this is wikiquote-only page, but it doesn't have any markers to say so.
> So with this one, I see no easy way to exclude it.
> OTOH, there are things like https://www.wikidata.org/wiki/Q17442446 or
> https://www.wikidata.org/wiki/Q17379835 - probably items in their
> hierarchy may be candidates for exclusion.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Help needed for Freebase to Wikidata migration

2015-06-15 Thread Stas Malyshev
Hi!

> As you may already know, I am currently working on the importation of
> Freebase content into Wikidata [1] using the primary source tool [2].
> 
> One of the big challenges of the migration is to build a good mapping of
> the properties of Freebase to Wikidata ones.There are a few thousand of
> properties so it is a task too big to be done alone. Your help is far
> more than welcome for this task on this
> page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping

Some of these look a bit challenging due to different semantics. I.e.
https://www.freebase.com/award/ranking/year probably matches point in
time (P585) but the former is for specific awards, while the latter is a
generic property that would be applied to the award claim. So it's not
1-1 transition. Would it still be useful to match P585 to
https://www.freebase.com/award/ranking/year and do the same in similar
cases? I.e. pretty much all /year ones would be P585, but in different
context.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Help needed for Freebase to Wikidata migration

2015-06-15 Thread Thomas Pellissier-Tanon
Hi!

Yes, such matching are useful. The goal here is to move data from Freebase
to Wikidata. So, we don't need 1-1 relation because the direction Wikidata
to Freebase is not important. What we want here is that, if we replace the
Freebase property by the Wikidata one, the relation remains valid.

So, P585 is a good matching for
https://www.freebase.com/award/ranking/year because
if we have "X award/ranking/year 1966" in Freebase then "X P585 1966" is
valid too.

Cheers,

Thomas

On Mon, Jun 15, 2015 at 9:26 PM, Stas Malyshev 
wrote:

> Hi!
>
> > As you may already know, I am currently working on the importation of
> > Freebase content into Wikidata [1] using the primary source tool [2].
> >
> > One of the big challenges of the migration is to build a good mapping of
> > the properties of Freebase to Wikidata ones.There are a few thousand of
> > properties so it is a task too big to be done alone. Your help is far
> > more than welcome for this task on this
> > page:
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
>
> Some of these look a bit challenging due to different semantics. I.e.
> https://www.freebase.com/award/ranking/year probably matches point in
> time (P585) but the former is for specific awards, while the latter is a
> generic property that would be applied to the award claim. So it's not
> 1-1 transition. Would it still be useful to match P585 to
> https://www.freebase.com/award/ranking/year and do the same in similar
> cases? I.e. pretty much all /year ones would be P585, but in different
> context.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata