Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-06 Thread P. Blissenbach
Hi

(1)
If we want to include media files not on commons, then we shall have to 
include data from foreign sources such as flickr or other types of 
repositories. We must do so without stealing or damaging the authority of these 
others. If we connect items to media linking them, or if we assign tags, 
labels, attributes, etc. to foreign media, or make statements involving them, 
we can do so of course collaboratively, but we cannot assume other communities 
to cooperate. Often they will, occasionally they will, not and the latter 
should not be a hindrance.

(2)
Assuming we are incorporating labels, tags and statements (claims) made in 
other repositories in additioni to simple and obvious technical information, we 
shall have to decide about incorporating the thesaurii, tagging systems, 
ontlogies, or whatever they use, first.

(3)
Much less complicated imho is the initial step to make files on commons and on 
other WMF wikis available for searches via WikiData. The goal has to be, imho, 
that everything we know already about them is to be converted into statements 
and made available to search queries. Since that involves reading descriptions 
and turning them into statements about media, we get a finer grained 
categorizing or tagging system than we have today. Itwill automatically become 
more multilingual as data grows. I currently believe that conversion from 
existing data has at least partially to be done semiautomatically, likely with 
suggestor bots, that e.g. ask questions like Is this cat: o Black, o Brown, o 
White, o Tigered, ... o Not a cat at all or In this sample, you hear the 
voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No 
voice at all,  That would allow to add considerable volumes missing data 
in little time, startig from categories existing in the wikis.

(4)
Searching should most of the time be a matter of making statements about what 
you want to find. Basic logical operations need to be availabe so as to limit 
unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki 
or Wolfram Alpha or Library Catalog Search Engines already have many of those 
;-)

Purodha

Gerard Meijssen gerard.meijs...@gmail.com writes:

Hoi,
I am really interested how you envision searching when all those topics are 
isolated and attached to each file.. 
 
I also am really interested to know when you have all those files isolated on 
Commons, how you will include media files that are NOT on Commons.. This is a 
normal use case.
Thanks,
       GerardM
 
On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk wrote:Not really 
relevant.

The way that this will be achieved will be a topics list attached to each 
file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; 
but the topics list will not be on Wikidata, but attached to each file, 
probably on the CommonsData wikibase.

  -- James.


On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: 
Wikidata should support and ease finding Commons-images.
This is not only about proper categorising and tagging in a true multilingual 
way,
but also about determining and assigning various properties - both 
automatically and
manually.

Think for example like an art director creating an image flyer (be it about 
Wikimania,
a national open source movement, or a company) looking for photograhps 
predominantly blue
depicing 8 humans or more of various ages in a neutral or indeterminate 
environent
and so on, so as to get the hang of it.

Purodha


Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com] writes:

Hoi,
I am firmly opposed to the idea that the Wikidatification of Commons is about 
Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very limited 
extend. Associating the images they refer to with existing items in Wikidata is 
one way in which they may be useful. As it is, because of naming conventions 
and the use of English only, the categories are pretty lame. They do not help 
me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms of 
what does it take to help people find images. Yes, those people will be 8 
years old and they may speak Mandarin or Japanese.
Thanks,
       GerardM



On 3 September 2014 12:05, James Heald j.he...@ucl.ac.uk[j.he...@ucl.ac.uk] 
wrote:Gerard,

I agree with you that I would like the kind of tools currently available with 
WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and 
straightforward.

What I don't understand is your objection to placing items that really only 
have a Commons notability, not a world notability, into a specific namespace, 
or (notionally) the separate database CommonsData, so that it is possible to 

Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-06 Thread Gerard Meijssen
Hoi,
The use case I was thinking of was to include the images that exist for
instance on English Wikipedia. Flickr and other repositories outside the
WMF are very much out of scope as far as I am concerned.

In my opinion it is silly to associate information about media files with
the media file itself. The objective is to search for an image of a horse
and every image of a horse should be included NEVER MIND where the file
is located. When the result is to be restricted to freely licensed
images, all images should be included NEVER MIND where the file is
located.

NB I love to understand why I am wrong in this.

Thanks,
  GerardM


On 6 September 2014 10:48, P. Blissenbach pu...@web.de wrote:

 Hi

 (1)
 If we want to include media files not on commons, then we shall have to
 include data from foreign sources such as flickr or other types of
 repositories. We must do so without stealing or damaging the authority of
 these others. If we connect items to media linking them, or if we assign
 tags, labels, attributes, etc. to foreign media, or make statements
 involving them, we can do so of course collaboratively, but we cannot
 assume other communities to cooperate. Often they will, occasionally they
 will, not and the latter should not be a hindrance.

 (2)
 Assuming we are incorporating labels, tags and statements (claims) made in
 other repositories in additioni to simple and obvious technical
 information, we shall have to decide about incorporating the thesaurii,
 tagging systems, ontlogies, or whatever they use, first.

 (3)
 Much less complicated imho is the initial step to make files on commons
 and on other WMF wikis available for searches via WikiData. The goal has to
 be, imho, that everything we know already about them is to be converted
 into statements and made available to search queries. Since that involves
 reading descriptions and turning them into statements about media, we get a
 finer grained categorizing or tagging system than we have today. Itwill
 automatically become more multilingual as data grows. I currently believe
 that conversion from existing data has at least partially to be done
 semiautomatically, likely with suggestor bots, that e.g. ask questions like
 Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all
 or In this sample, you hear the voice of a: o Female, o Male, o Child, o
 Cannot tell, o Several voices, o No voice at all,  That would allow to
 add considerable volumes missing data in little time, startig from
 categories existing in the wikis.

 (4)
 Searching should most of the time be a matter of making statements about
 what you want to find. Basic logical operations need to be availabe so as
 to limit unwieldy result sets, plus additional stepwise refinements.
 Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines
 already have many of those ;-)

 Purodha

 Gerard Meijssen gerard.meijs...@gmail.com writes:

 Hoi,
 I am really interested how you envision searching when all those topics
 are isolated and attached to each file..

 I also am really interested to know when you have all those files isolated
 on Commons, how you will include media files that are NOT on Commons.. This
 is a normal use case.
 Thanks,
GerardM

 On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk wrote:Not
 really relevant.

 The way that this will be achieved will be a topics list attached to
 each file, each topic being a pointer to a Wikidata item.

 Sure, Wikidata may be used as one of the sources to help build the topics
 list; but the topics list will not be on Wikidata, but attached to each
 file, probably on the CommonsData wikibase.

   -- James.


 On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view:
 Wikidata should support and ease finding Commons-images.
 This is not only about proper categorising and tagging in a true
 multilingual way,
 but also about determining and assigning various properties - both
 automatically and
 manually.

 Think for example like an art director creating an image flyer (be it
 about Wikimania,
 a national open source movement, or a company) looking for photograhps
 predominantly blue
 depicing 8 humans or more of various ages in a neutral or
 indeterminate environent
 and so on, so as to get the hang of it.

 Purodha


 Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com]
 writes:

 Hoi,
 I am firmly opposed to the idea that the Wikidatification of Commons is
 about Commons. That is imho a disaster.

 It is about mediafiles and they exist in many Wikis.

 The categories of Commons are in and off themselves useful to a very
 limited extend. Associating the images they refer to with existing items in
 Wikidata is one way in which they may be useful. As it is, because of
 naming conventions and the use of English only, the categories are pretty
 lame. They do not help me when I am looking for an image in Commons at all.

 Really my point is forget about 

Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-06 Thread P. Blissenbach
I have no idea how we can find media without having statements on them of the 
kind depicts a (some-item) or is an instance of (photgraph), taken at 
[Date], etc., where (items) are represented by Q-something and [values] as 
usual.

Of course, from depicts Q112015(=town musicians of Bremen), we might infer 
each of depicts a (donkey) and depicts a (dog) and depicts a (cat) and 
depicts a (cock) and likely much more.

Having the bulk of statements on the items depicted, recorded, etc. is imho 
okay. Yet there may be precision applying only to specific media, such as a 
_male_ voice recording of (some-literary-work). On the long run, I believe, we 
should have these, too, so as to allow precise queries.

Btw., I agree that the actual location of media files should be of little 
concern. It is represented by an URL, that is it.

Purodha


Gerard Meijssen gerard.meijs...@gmail.com wrote:

Hoi,The use case I was thinking of was to include the images that exist for 
instance on English Wikipedia. Flickr and other repositories outside the WMF 
are very much out of scope as far as I am concerned.
 In my opinion it is silly to associate information about media files with the 
media file itself. The objective is to search for an image of a horse and 
every image of a horse should be included NEVER MIND where the file is 
located. When the result is to be restricted to freely licensed images, all 
images should be included NEVER MIND where the file is located.
 NB I love to understand why I am wrong in this.

Thanks,  GerardM
 
On 6 September 2014 10:48, P. Blissenbach pu...@web.de wrote:Hi

(1)
If we want to include media files not on commons, then we shall have to 
include data from foreign sources such as flickr or other types of 
repositories. We must do so without stealing or damaging the authority of these 
others. If we connect items to media linking them, or if we assign tags, 
labels, attributes, etc. to foreign media, or make statements involving them, 
we can do so of course collaboratively, but we cannot assume other communities 
to cooperate. Often they will, occasionally they will, not and the latter 
should not be a hindrance.

(2)
Assuming we are incorporating labels, tags and statements (claims) made in 
other repositories in additioni to simple and obvious technical information, we 
shall have to decide about incorporating the thesaurii, tagging systems, 
ontlogies, or whatever they use, first.

(3)
Much less complicated imho is the initial step to make files on commons and on 
other WMF wikis available for searches via WikiData. The goal has to be, imho, 
that everything we know already about them is to be converted into statements 
and made available to search queries. Since that involves reading descriptions 
and turning them into statements about media, we get a finer grained 
categorizing or tagging system than we have today. Itwill automatically become 
more multilingual as data grows. I currently believe that conversion from 
existing data has at least partially to be done semiautomatically, likely with 
suggestor bots, that e.g. ask questions like Is this cat: o Black, o Brown, o 
White, o Tigered, ... o Not a cat at all or In this sample, you hear the 
voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No 
voice at all,  That would allow to add considerable volumes missing data 
in little time, startig from categories existing in the wikis.

(4)
Searching should most of the time be a matter of making statements about what 
you want to find. Basic logical operations need to be availabe so as to limit 
unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki 
or Wolfram Alpha or Library Catalog Search Engines already have many of those 
;-)

Purodha

Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com] writes:

Hoi,
I am really interested how you envision searching when all those topics are 
isolated and attached to each file.. 
 
I also am really interested to know when you have all those files isolated on 
Commons, how you will include media files that are NOT on Commons.. This is a 
normal use case.
Thanks,
       GerardM
 
On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk[j.he...@ucl.ac.uk] 
wrote:Not really relevant.

The way that this will be achieved will be a topics list attached to each 
file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; 
but the topics list will not be on Wikidata, but attached to each file, 
probably on the CommonsData wikibase.

  -- James.


On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: 
Wikidata should support and ease finding Commons-images.
This is not only about proper categorising and tagging in a true multilingual 
way,
but also about determining and assigning various properties - both 
automatically and
manually.

Think for example like an art director creating an image 

Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-03 Thread James Heald

@Joe Filceolaire

Fair enough.  I had misread the rules.  I thought it was the Commons Cat 
that needed to have a sitelink to some other page on any Wikimedia 
Project, rather than the requirement just being that a Wikidata item 
needed to have a sitelink to eg a Commons Cat.


So per the current rules, these Commons Cats could all have Wikidata 
items (though I still think that would be a mistake).



I fact I believe nearly every Commons Category has a corresponding
wikidata category item.


That is not correct.

There are currently 3,338,000 categories on Commons (excluding redirects)

About 250,000 category-like items on Wikidata have links to Commons (the 
number is similar either counting sitelinks, or property P373.)


About 688,000 article-like items on Wikidata have links to Commons 
categories using property P373.


So between 2,400,000 and 2,650,000 categories on Commons are currently 
pointed to by neither a category-like item, not an article-like item.



In my view that should continue to be the case.

We're setting up a separate database or namespace for Commons files 
anyway; so doesn't it make more sense for entities like Commons 
categories that really only relate to Commons to have items held in that 
database or namespace, rather than in main Wikidata?


What are the advantages of adding two and a half million items of 
wiki-junk to Wikidata?



Yes, like other items on CommonsData, the properties of such C-items 
would normally point to Q-items on main Wikidata.


Looking at the modelling of the two categories in more detail:

First, Category:Images released by British Library Images Online

* It's not clear that BL Images Online would actually have its own 
Q-item.  The British Library certainly does.  Images Online is one of 
many parts of the BL.


But even if we create Images Online as a useful thing to link to, that's 
not really the point.  This category (despite its title) is really for a 
specific release of images from BL Images Online.  If there were another 
release, that would have a new different (sub-)category.


Yes, we could perhaps capture the set with a query specifying the source 
and the date.  But as a distinctive set, its useful to have a (C-)item 
that can represent it, (i) acting as a container for the query, and any 
other information about the set that might be relevant; and (ii) acting 
as a target for searches, so the set can be retrieved directly with a 
simple search, rather than requiring a complex search combining multiple 
properties.



Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd

Again, the important thing is that (despite its title) what this 
category really represents is a particular set of *scans*.


There are already titles where we have multiple sets of scans for a 
single book, from different sources, often with different image 
characteristics.


In the jargon, these scan-sets are called manifestations of the work. 
 On main Wikidata, current guidance is to have Q-items for works, and 
Q-items for editions, but not Q-items for manifestations of editions. 
So on current sourcing guidance, again, this category should not have a 
Q-item.


But it does make sense for it to have an item for operational reasons on 
Commons, so (IMO) it makes sense for it to have a C-item on CommonsData.


The C-item would reference the Q-item on WikiData about the edition; but 
would also contain information specific to the C-item -- for example, 
that the source for these scans was a particular copy of the book 
scanned and released as part of the Mechanical Curator collection.


Scans of other copies of the same edition of the same book might have 
separately been released as part of the Mechanical Curator collection, 
part of the Wellcome collection, part of a release by the NYPL, or part 
of the Internet Archive Book Images collection (which in itself can 
contain multiple releases of the same book, from different libraries).


This source information can be quite detailed, along with credit-line 
information, and specific link-back information.  So (IMO) it makes 
sense to be able to hold it as a single item for the set, rather than 
only be able to extract it as a query from the individual images.


Furthermore, this is information that one wants to be able to display on 
the Commons category page.  It doesn't make sense to have to run a query 
over the images (which images? all of them?) in the category, just to be 
able to display header information on the category page.



  -- James.












On 01/09/2014 17:43, Joe Filceolaire wrote:

James
I think the problem is not as difficult as you have described.

If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you
will see that each wikimedia commons page can have a corresponding item.
The comment that a sitelink to a category page in Wikimedia Commons is
*not* allowed on main article items means that Commons Category pages
should link to Category items and 

Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-03 Thread James Heald

Gerard,

I agree with you that I would like the kind of tools currently available 
with WikiData also to be available on CommonsData.


Queries that combine the two in an integrated way ought to be made 
simple and straightforward.


What I don't understand is your objection to placing items that really 
only have a Commons notability, not a world notability, into a specific 
namespace, or (notionally) the separate database CommonsData, so that it 
is possible to run those queries that only relate to Commons information 
solely on CommonsData, and those queries that only relate to world 
information solely on WikiData.


Does that not make more sense, than requiring the full bulk of the 
combined database to always be addressed in order to run any query?


  -- James.



On 01/09/2014 07:07, Gerard Meijssen wrote:

Hoi,
Wikidata is very much a working database. Its relevance is exactly
because of this. Without the connection to the interwiki links, it would
not be the same, it would not have the coverage and it would not have the
same sized community.

Considerations about secondary use are secondary. Yes, people may use it
for their own purposes and when it fits their needs, well and good. When it
does not, that is fine too. As it is, we do have all kind of Wiki junk in
there. We have disambiguation pages, list articles, templates, categories.
The challenge is to find a use for them.

When I add statements based on categories, I document many categories
[1]. As a result over 900 items for categories will show the result of a
query in the Reasonator. The results is what I think a category could
contain given the subject of a category. For Wikipedians they are articles
not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most
obvious one is including articles that are not part of the selection eg a
list in a category full of humans. Currently not everything can be
expressed in a way that allows Reasonator to pick things up in a query..
dates come to mind. Then there are the categories that have an arbitrary
set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up
with. In essence when you can sort it / select it Wikidata will do a better
job for you. The only thing we have to do is identify the items that fit
the mold. This is something that you can often find the basis for in
existing categories.
Thanks,
  GerardM


[1]
http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wikidata.html

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-03 Thread Gerard Meijssen
Hoi,
I am firmly opposed to the idea that the Wikidatification of Commons is
about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very
limited extend. Associating the images they refer to with existing items in
Wikidata is one way in which they may be useful. As it is, because of
naming conventions and the use of English only, the categories are pretty
lame. They do not help me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms
of what does it take to help people find images. Yes, those people will
be 8 years old and they may speak Mandarin or Japanese.
Thanks,
  GerardM




On 3 September 2014 12:05, James Heald j.he...@ucl.ac.uk wrote:

 Gerard,

 I agree with you that I would like the kind of tools currently available
 with WikiData also to be available on CommonsData.

 Queries that combine the two in an integrated way ought to be made simple
 and straightforward.

 What I don't understand is your objection to placing items that really
 only have a Commons notability, not a world notability, into a specific
 namespace, or (notionally) the separate database CommonsData, so that it is
 possible to run those queries that only relate to Commons information
 solely on CommonsData, and those queries that only relate to world
 information solely on WikiData.

 Does that not make more sense, than requiring the full bulk of the
 combined database to always be addressed in order to run any query?

   -- James.




 On 01/09/2014 07:07, Gerard Meijssen wrote:

 Hoi,
 Wikidata is very much a working database. Its relevance is exactly
 because of this. Without the connection to the interwiki links, it would
 not be the same, it would not have the coverage and it would not have the
 same sized community.

 Considerations about secondary use are secondary. Yes, people may use it
 for their own purposes and when it fits their needs, well and good. When
 it
 does not, that is fine too. As it is, we do have all kind of Wiki junk
 in
 there. We have disambiguation pages, list articles, templates, categories.
 The challenge is to find a use for them.

 When I add statements based on categories, I document many categories
 [1]. As a result over 900 items for categories will show the result of a
 query in the Reasonator. The results is what I think a category could
 contain given the subject of a category. For Wikipedians they are articles
 not categorised, red links and blue links.

 There are several reasons why this is not (yet) a perfect fit. The most
 obvious one is including articles that are not part of the selection eg a
 list in a category full of humans. Currently not everything can be
 expressed in a way that allows Reasonator to pick things up in a query..
 dates come to mind. Then there are the categories that have an arbitrary
 set of entries.

 I am not going to speculate on what kind of qualifiers Commons will come
 up
 with. In essence when you can sort it / select it Wikidata will do a
 better
 job for you. The only thing we have to do is identify the items that fit
 the mold. This is something that you can often find the basis for in
 existing categories.
 Thanks,
   GerardM


 [1]
 http://ultimategerardm.blogspot.nl/2014/08/wikidata-
 my-workflow-enriching-wikidata.html

 http://tools.wmflabs.org/wikidata-todo/autolist.html?q=
 CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20



 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-03 Thread Derric Atzrott
 The categories of Commons are in and off themselves useful to a very limited
 extend. Associating the images they refer to with existing items in Wikidata
 is one way in which they may be useful. As it is, because of naming
 conventions and the use of English only, the categories are pretty lame. They
 do not help me when I am looking for an image in Commons at all. 

Couldn't this still be done from CommonsData?  I thought the items in that
database would be able to reference the ones in the Wikidata database and vice
versa.

 Really my point is forget about Commons notability start thinking in terms
 of what does it take to help people find images. Yes, those people will
 be 8 years old and they may speak Mandarin or Japanese.

I'm confused, wouldn't having the data in CommonsData still help with this?

 Considerations about secondary use are secondary. Yes, people may use it
 for their own purposes and when it fits their needs, well and good. When it
 does not, that is fine too. As it is, we do have all kind of Wiki junk in
 there. We have disambiguation pages, list articles, templates, categories.
 The challenge is to find a use for them.

I'd disagree that considerations about secondary use are secondary.  Wikidata
really has a huge potential for secondary use and we shouldn't forget that.



I'm somewhat confused about this thread.  Did I miss something?  My
understanding is that Commons will be getting its own Wikibase install in order
to keep track of image metadata.  We are currently having a debate over whether
the 3.3 million Commons categories should be kept in Wikidata or CommonsData.

The CommonsData argument is that it keeps stuff only really useful to Commons
out of the namespace that has thus far been mostly used for items relating to
the real world.

The Wikidata argument is that there is already a ton of wiki-junk in Wikidata
and we shouldn't worry about reuse of Wikidata because it is primarily a tool
for Wikimedia editors and that having the data on Wikidata itself would allow
editors to more easily find useful images.

Am I understanding that correctly?

Thank you,
Derric Atzrott


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-09-01 Thread Gerard Meijssen
Hoi,
Wikidata is very much a working database. Its relevance is exactly
because of this. Without the connection to the interwiki links, it would
not be the same, it would not have the coverage and it would not have the
same sized community.

Considerations about secondary use are secondary. Yes, people may use it
for their own purposes and when it fits their needs, well and good. When it
does not, that is fine too. As it is, we do have all kind of Wiki junk in
there. We have disambiguation pages, list articles, templates, categories.
The challenge is to find a use for them.

When I add statements based on categories, I document many categories
[1]. As a result over 900 items for categories will show the result of a
query in the Reasonator. The results is what I think a category could
contain given the subject of a category. For Wikipedians they are articles
not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most
obvious one is including articles that are not part of the selection eg a
list in a category full of humans. Currently not everything can be
expressed in a way that allows Reasonator to pick things up in a query..
dates come to mind. Then there are the categories that have an arbitrary
set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up
with. In essence when you can sort it / select it Wikidata will do a better
job for you. The only thing we have to do is identify the items that fit
the mold. This is something that you can often find the basis for in
existing categories.
Thanks,
 GerardM


[1]
http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wikidata.html

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20


On 1 September 2014 00:42, James Heald j.he...@ucl.ac.uk wrote:

 Hi everybody,

 Sorry to open up an old thread again after ten days, but there were some
 things in Lydia's reply below that I wanted to come back to.

 So, first, a couple of examples of the kind of Commons Categories I had in
 mind:

 https://commons.wikimedia.org/wiki/Category:Images_released_
 by_British_Library_Images_Online

 https://commons.wikimedia.org/wiki/Category:Metropolitan_
 Improvements_%281828%29_Thomas_Hosmer_Shepherd

 Despite their names, both these cats effectively identify images from
 particular photosets on Flickr.  The first category relates to a particular
 set of images released by a particular institution on a particular date.
 The second relates to a particular set of scans from a particular edition
 of a particular book.  Both (IMO) would (and, moreover *should*) currently
 fail Wikidata:Notability.

 The book, and even the edition, might be notable. But a particular set of
 scans surely would not. Similarly, the first category is really just a
 photoset from Flickr, again something that wouldn't currently get a
 Wikidata Q-number.

 Now in the email below, Lydia effectively said: no problem, just give each
 Commons Category a Wikidata Q-number anyway.  (Imho they should be on
 Wikidata. I fear if we introduce another layer it'll be considerably harder
 to use and maintain.)

 GerardM, in sessions at Wikimania, also argued strongly simply for putting
 everything in Wikidata.

 But I think this would be a mistake, because IMO Wikidata:Notability is a
 positive virtue, which should be defended.  It is *useful* to people that
 they can download a dump of Wikidata for their own purposes, and get
 real-world relevant items, rather than the dump being bloated with wiki
 junk.

 So in my opinion, Commons categories should generally *not* get Q-numbers
 on Wikidata (unless they pass WD:N), but should instead get items on the
 Commons Wikibase which is being created expressly for the purpose of
 holding structured data on things which really only have a commonswiki
 significance, and are not real-world notable.



 A second point relates to Magnus's issue about how much of this could be
 replaced by queries.

 Yes, if one were progressively building up a topic search on images from
 books in the 1-million image BL Mechanical Curator release, one might ask
 for books about London, then books published in a particular date range.
 But within that, the natural query to specify scans from this particular
 copy of 'Metropolitan Improvements' is the image's membership of this
 particular set -- membership of the set in itself is something that should
 be queryable, and such a query is the kind of query that, at the right
 stage, should be offerable to the user trying to refine their search.


 In fact, most current Commons categories will not be WD-notable.  But even
 for the most egregious of Commons intersection categories, IMO it will
 still be worth the Commons Wikibase tracking category membership for an
 image, not least for the ability that will give to easily present the
 category's files in different ways -- eg perhaps 

[Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

2014-08-31 Thread James Heald

Hi everybody,

Sorry to open up an old thread again after ten days, but there were some 
things in Lydia's reply below that I wanted to come back to.


So, first, a couple of examples of the kind of Commons Categories I had 
in mind:


https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Library_Images_Online

https://commons.wikimedia.org/wiki/Category:Metropolitan_Improvements_%281828%29_Thomas_Hosmer_Shepherd

Despite their names, both these cats effectively identify images from 
particular photosets on Flickr.  The first category relates to a 
particular set of images released by a particular institution on a 
particular date.  The second relates to a particular set of scans from a 
particular edition of a particular book.  Both (IMO) would (and, 
moreover *should*) currently fail Wikidata:Notability.


The book, and even the edition, might be notable. But a particular set 
of scans surely would not. Similarly, the first category is really just 
a photoset from Flickr, again something that wouldn't currently get a 
Wikidata Q-number.


Now in the email below, Lydia effectively said: no problem, just give 
each Commons Category a Wikidata Q-number anyway.  (Imho they should be 
on Wikidata. I fear if we introduce another layer it'll be considerably 
harder to use and maintain.)


GerardM, in sessions at Wikimania, also argued strongly simply for 
putting everything in Wikidata.


But I think this would be a mistake, because IMO Wikidata:Notability is 
a positive virtue, which should be defended.  It is *useful* to people 
that they can download a dump of Wikidata for their own purposes, and 
get real-world relevant items, rather than the dump being bloated with 
wiki junk.


So in my opinion, Commons categories should generally *not* get 
Q-numbers on Wikidata (unless they pass WD:N), but should instead get 
items on the Commons Wikibase which is being created expressly for the 
purpose of holding structured data on things which really only have a 
commonswiki significance, and are not real-world notable.




A second point relates to Magnus's issue about how much of this could be 
replaced by queries.


Yes, if one were progressively building up a topic search on images from 
books in the 1-million image BL Mechanical Curator release, one might 
ask for books about London, then books published in a particular date 
range.  But within that, the natural query to specify scans from this 
particular copy of 'Metropolitan Improvements' is the image's membership 
of this particular set -- membership of the set in itself is something 
that should be queryable, and such a query is the kind of query that, at 
the right stage, should be offerable to the user trying to refine their 
search.



In fact, most current Commons categories will not be WD-notable.  But 
even for the most egregious of Commons intersection categories, IMO it 
will still be worth the Commons Wikibase tracking category membership 
for an image, not least for the ability that will give to easily present 
the category's files in different ways -- eg perhaps sorted by filename; 
or by original creation date; or by upload date; or by uploader; or by 
geographical proximity... etc.  Holding the category membership in the 
wikibase then allows people to write gadgets to sort or filter or 
re-present the category in multiple ways.  So it's useful to have the 
category as an entity that can be a target for a property.



But there are also reasons for a category to have an item in its own 
right -- because there is structured data that one may wish to associate 
with the category:  one example would be access stats to members of the 
category (eg which categories in the Mechanical Curator collection have 
had the most file views?) -- the kind of thing of great interest to GLAMs.


Many categories also contain information defining them -- for example, 
for the book scans category, one would want a property that this 
category contained scans of the particular book (pointed to by its 
Q-number), probably a particular edition (probably a qualifier).  One 
might also want to associate linked data -- pointers to entries for the 
book in (possibly multiple) catalogues of its original host institution.


So for all these reasons it may well be useful, as a matter of course, 
to have a container for structured information associated with each 
commonscat.


This is why I think each and every category on Commons should have its 
own Commons Wikibase item, with an associated C-number.



Queries are important, but I'd suggest they are best seen as an 
*addition* to the present category system, rather than a *replacement* 
for it.


A particular way forward, it seems to me,  might be to allow categories 
to be *augmented* with specific queries -- i.e. to allow rules to be 
specified for particular categories, so that files whose structured-data 
topic information matched the rules would automatically be added to the 
categories, alongside