Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Hi (1) If we want to include media files not on commons, then we shall have to include data from foreign sources such as flickr or other types of repositories. We must do so without stealing or damaging the authority of these others. If we connect items to media linking them, or if we assign tags, labels, attributes, etc. to foreign media, or make statements involving them, we can do so of course collaboratively, but we cannot assume other communities to cooperate. Often they will, occasionally they will, not and the latter should not be a hindrance. (2) Assuming we are incorporating labels, tags and statements (claims) made in other repositories in additioni to simple and obvious technical information, we shall have to decide about incorporating the thesaurii, tagging systems, ontlogies, or whatever they use, first. (3) Much less complicated imho is the initial step to make files on commons and on other WMF wikis available for searches via WikiData. The goal has to be, imho, that everything we know already about them is to be converted into statements and made available to search queries. Since that involves reading descriptions and turning them into statements about media, we get a finer grained categorizing or tagging system than we have today. Itwill automatically become more multilingual as data grows. I currently believe that conversion from existing data has at least partially to be done semiautomatically, likely with suggestor bots, that e.g. ask questions like Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all or In this sample, you hear the voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No voice at all, That would allow to add considerable volumes missing data in little time, startig from categories existing in the wikis. (4) Searching should most of the time be a matter of making statements about what you want to find. Basic logical operations need to be availabe so as to limit unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines already have many of those ;-) Purodha Gerard Meijssen gerard.meijs...@gmail.com writes: Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file.. I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk wrote:Not really relevant. The way that this will be achieved will be a topics list attached to each file, each topic being a pointer to a Wikidata item. Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase. -- James. On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually. Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps predominantly blue depicing 8 humans or more of various ages in a neutral or indeterminate environent and so on, so as to get the hang of it. Purodha Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com] writes: Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster. It is about mediafiles and they exist in many Wikis. The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all. Really my point is forget about Commons notability start thinking in terms of what does it take to help people find images. Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM On 3 September 2014 12:05, James Heald j.he...@ucl.ac.uk[j.he...@ucl.ac.uk] wrote:Gerard, I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData. Queries that combine the two in an integrated way ought to be made simple and straightforward. What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Hoi, The use case I was thinking of was to include the images that exist for instance on English Wikipedia. Flickr and other repositories outside the WMF are very much out of scope as far as I am concerned. In my opinion it is silly to associate information about media files with the media file itself. The objective is to search for an image of a horse and every image of a horse should be included NEVER MIND where the file is located. When the result is to be restricted to freely licensed images, all images should be included NEVER MIND where the file is located. NB I love to understand why I am wrong in this. Thanks, GerardM On 6 September 2014 10:48, P. Blissenbach pu...@web.de wrote: Hi (1) If we want to include media files not on commons, then we shall have to include data from foreign sources such as flickr or other types of repositories. We must do so without stealing or damaging the authority of these others. If we connect items to media linking them, or if we assign tags, labels, attributes, etc. to foreign media, or make statements involving them, we can do so of course collaboratively, but we cannot assume other communities to cooperate. Often they will, occasionally they will, not and the latter should not be a hindrance. (2) Assuming we are incorporating labels, tags and statements (claims) made in other repositories in additioni to simple and obvious technical information, we shall have to decide about incorporating the thesaurii, tagging systems, ontlogies, or whatever they use, first. (3) Much less complicated imho is the initial step to make files on commons and on other WMF wikis available for searches via WikiData. The goal has to be, imho, that everything we know already about them is to be converted into statements and made available to search queries. Since that involves reading descriptions and turning them into statements about media, we get a finer grained categorizing or tagging system than we have today. Itwill automatically become more multilingual as data grows. I currently believe that conversion from existing data has at least partially to be done semiautomatically, likely with suggestor bots, that e.g. ask questions like Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all or In this sample, you hear the voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No voice at all, That would allow to add considerable volumes missing data in little time, startig from categories existing in the wikis. (4) Searching should most of the time be a matter of making statements about what you want to find. Basic logical operations need to be availabe so as to limit unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines already have many of those ;-) Purodha Gerard Meijssen gerard.meijs...@gmail.com writes: Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file.. I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk wrote:Not really relevant. The way that this will be achieved will be a topics list attached to each file, each topic being a pointer to a Wikidata item. Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase. -- James. On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually. Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps predominantly blue depicing 8 humans or more of various ages in a neutral or indeterminate environent and so on, so as to get the hang of it. Purodha Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com] writes: Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster. It is about mediafiles and they exist in many Wikis. The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all. Really my point is forget about
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
I have no idea how we can find media without having statements on them of the kind depicts a (some-item) or is an instance of (photgraph), taken at [Date], etc., where (items) are represented by Q-something and [values] as usual. Of course, from depicts Q112015(=town musicians of Bremen), we might infer each of depicts a (donkey) and depicts a (dog) and depicts a (cat) and depicts a (cock) and likely much more. Having the bulk of statements on the items depicted, recorded, etc. is imho okay. Yet there may be precision applying only to specific media, such as a _male_ voice recording of (some-literary-work). On the long run, I believe, we should have these, too, so as to allow precise queries. Btw., I agree that the actual location of media files should be of little concern. It is represented by an URL, that is it. Purodha Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi,The use case I was thinking of was to include the images that exist for instance on English Wikipedia. Flickr and other repositories outside the WMF are very much out of scope as far as I am concerned. In my opinion it is silly to associate information about media files with the media file itself. The objective is to search for an image of a horse and every image of a horse should be included NEVER MIND where the file is located. When the result is to be restricted to freely licensed images, all images should be included NEVER MIND where the file is located. NB I love to understand why I am wrong in this. Thanks, GerardM On 6 September 2014 10:48, P. Blissenbach pu...@web.de wrote:Hi (1) If we want to include media files not on commons, then we shall have to include data from foreign sources such as flickr or other types of repositories. We must do so without stealing or damaging the authority of these others. If we connect items to media linking them, or if we assign tags, labels, attributes, etc. to foreign media, or make statements involving them, we can do so of course collaboratively, but we cannot assume other communities to cooperate. Often they will, occasionally they will, not and the latter should not be a hindrance. (2) Assuming we are incorporating labels, tags and statements (claims) made in other repositories in additioni to simple and obvious technical information, we shall have to decide about incorporating the thesaurii, tagging systems, ontlogies, or whatever they use, first. (3) Much less complicated imho is the initial step to make files on commons and on other WMF wikis available for searches via WikiData. The goal has to be, imho, that everything we know already about them is to be converted into statements and made available to search queries. Since that involves reading descriptions and turning them into statements about media, we get a finer grained categorizing or tagging system than we have today. Itwill automatically become more multilingual as data grows. I currently believe that conversion from existing data has at least partially to be done semiautomatically, likely with suggestor bots, that e.g. ask questions like Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all or In this sample, you hear the voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No voice at all, That would allow to add considerable volumes missing data in little time, startig from categories existing in the wikis. (4) Searching should most of the time be a matter of making statements about what you want to find. Basic logical operations need to be availabe so as to limit unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines already have many of those ;-) Purodha Gerard Meijssen gerard.meijs...@gmail.com[gerard.meijs...@gmail.com] writes: Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file.. I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM On 3 September 2014 15:33, James Heald j.he...@ucl.ac.uk[j.he...@ucl.ac.uk] wrote:Not really relevant. The way that this will be achieved will be a topics list attached to each file, each topic being a pointer to a Wikidata item. Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase. -- James. On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually. Think for example like an art director creating an image
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
@Joe Filceolaire Fair enough. I had misread the rules. I thought it was the Commons Cat that needed to have a sitelink to some other page on any Wikimedia Project, rather than the requirement just being that a Wikidata item needed to have a sitelink to eg a Commons Cat. So per the current rules, these Commons Cats could all have Wikidata items (though I still think that would be a mistake). I fact I believe nearly every Commons Category has a corresponding wikidata category item. That is not correct. There are currently 3,338,000 categories on Commons (excluding redirects) About 250,000 category-like items on Wikidata have links to Commons (the number is similar either counting sitelinks, or property P373.) About 688,000 article-like items on Wikidata have links to Commons categories using property P373. So between 2,400,000 and 2,650,000 categories on Commons are currently pointed to by neither a category-like item, not an article-like item. In my view that should continue to be the case. We're setting up a separate database or namespace for Commons files anyway; so doesn't it make more sense for entities like Commons categories that really only relate to Commons to have items held in that database or namespace, rather than in main Wikidata? What are the advantages of adding two and a half million items of wiki-junk to Wikidata? Yes, like other items on CommonsData, the properties of such C-items would normally point to Q-items on main Wikidata. Looking at the modelling of the two categories in more detail: First, Category:Images released by British Library Images Online * It's not clear that BL Images Online would actually have its own Q-item. The British Library certainly does. Images Online is one of many parts of the BL. But even if we create Images Online as a useful thing to link to, that's not really the point. This category (despite its title) is really for a specific release of images from BL Images Online. If there were another release, that would have a new different (sub-)category. Yes, we could perhaps capture the set with a query specifying the source and the date. But as a distinctive set, its useful to have a (C-)item that can represent it, (i) acting as a container for the query, and any other information about the set that might be relevant; and (ii) acting as a target for searches, so the set can be retrieved directly with a simple search, rather than requiring a complex search combining multiple properties. Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd Again, the important thing is that (despite its title) what this category really represents is a particular set of *scans*. There are already titles where we have multiple sets of scans for a single book, from different sources, often with different image characteristics. In the jargon, these scan-sets are called manifestations of the work. On main Wikidata, current guidance is to have Q-items for works, and Q-items for editions, but not Q-items for manifestations of editions. So on current sourcing guidance, again, this category should not have a Q-item. But it does make sense for it to have an item for operational reasons on Commons, so (IMO) it makes sense for it to have a C-item on CommonsData. The C-item would reference the Q-item on WikiData about the edition; but would also contain information specific to the C-item -- for example, that the source for these scans was a particular copy of the book scanned and released as part of the Mechanical Curator collection. Scans of other copies of the same edition of the same book might have separately been released as part of the Mechanical Curator collection, part of the Wellcome collection, part of a release by the NYPL, or part of the Internet Archive Book Images collection (which in itself can contain multiple releases of the same book, from different libraries). This source information can be quite detailed, along with credit-line information, and specific link-back information. So (IMO) it makes sense to be able to hold it as a single item for the set, rather than only be able to extract it as a query from the individual images. Furthermore, this is information that one wants to be able to display on the Commons category page. It doesn't make sense to have to run a query over the images (which images? all of them?) in the category, just to be able to display header information on the category page. -- James. On 01/09/2014 17:43, Joe Filceolaire wrote: James I think the problem is not as difficult as you have described. If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you will see that each wikimedia commons page can have a corresponding item. The comment that a sitelink to a category page in Wikimedia Commons is *not* allowed on main article items means that Commons Category pages should link to Category items and
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Gerard, I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData. Queries that combine the two in an integrated way ought to be made simple and straightforward. What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData. Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query? -- James. On 01/09/2014 07:07, Gerard Meijssen wrote: Hoi, Wikidata is very much a working database. Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community. Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki junk in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them. When I add statements based on categories, I document many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links. There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an arbitrary set of entries. I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The only thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM [1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wikidata.html http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster. It is about mediafiles and they exist in many Wikis. The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all. Really my point is forget about Commons notability start thinking in terms of what does it take to help people find images. Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM On 3 September 2014 12:05, James Heald j.he...@ucl.ac.uk wrote: Gerard, I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData. Queries that combine the two in an integrated way ought to be made simple and straightforward. What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData. Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query? -- James. On 01/09/2014 07:07, Gerard Meijssen wrote: Hoi, Wikidata is very much a working database. Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community. Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki junk in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them. When I add statements based on categories, I document many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links. There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an arbitrary set of entries. I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The only thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM [1] http://ultimategerardm.blogspot.nl/2014/08/wikidata- my-workflow-enriching-wikidata.html http://tools.wmflabs.org/wikidata-todo/autolist.html?q= CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all. Couldn't this still be done from CommonsData? I thought the items in that database would be able to reference the ones in the Wikidata database and vice versa. Really my point is forget about Commons notability start thinking in terms of what does it take to help people find images. Yes, those people will be 8 years old and they may speak Mandarin or Japanese. I'm confused, wouldn't having the data in CommonsData still help with this? Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki junk in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them. I'd disagree that considerations about secondary use are secondary. Wikidata really has a huge potential for secondary use and we shouldn't forget that. I'm somewhat confused about this thread. Did I miss something? My understanding is that Commons will be getting its own Wikibase install in order to keep track of image metadata. We are currently having a debate over whether the 3.3 million Commons categories should be kept in Wikidata or CommonsData. The CommonsData argument is that it keeps stuff only really useful to Commons out of the namespace that has thus far been mostly used for items relating to the real world. The Wikidata argument is that there is already a ton of wiki-junk in Wikidata and we shouldn't worry about reuse of Wikidata because it is primarily a tool for Wikimedia editors and that having the data on Wikidata itself would allow editors to more easily find useful images. Am I understanding that correctly? Thank you, Derric Atzrott ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Hoi, Wikidata is very much a working database. Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community. Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki junk in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them. When I add statements based on categories, I document many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links. There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an arbitrary set of entries. I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The only thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM [1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wikidata.html http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20 On 1 September 2014 00:42, James Heald j.he...@ucl.ac.uk wrote: Hi everybody, Sorry to open up an old thread again after ten days, but there were some things in Lydia's reply below that I wanted to come back to. So, first, a couple of examples of the kind of Commons Categories I had in mind: https://commons.wikimedia.org/wiki/Category:Images_released_ by_British_Library_Images_Online https://commons.wikimedia.org/wiki/Category:Metropolitan_ Improvements_%281828%29_Thomas_Hosmer_Shepherd Despite their names, both these cats effectively identify images from particular photosets on Flickr. The first category relates to a particular set of images released by a particular institution on a particular date. The second relates to a particular set of scans from a particular edition of a particular book. Both (IMO) would (and, moreover *should*) currently fail Wikidata:Notability. The book, and even the edition, might be notable. But a particular set of scans surely would not. Similarly, the first category is really just a photoset from Flickr, again something that wouldn't currently get a Wikidata Q-number. Now in the email below, Lydia effectively said: no problem, just give each Commons Category a Wikidata Q-number anyway. (Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.) GerardM, in sessions at Wikimania, also argued strongly simply for putting everything in Wikidata. But I think this would be a mistake, because IMO Wikidata:Notability is a positive virtue, which should be defended. It is *useful* to people that they can download a dump of Wikidata for their own purposes, and get real-world relevant items, rather than the dump being bloated with wiki junk. So in my opinion, Commons categories should generally *not* get Q-numbers on Wikidata (unless they pass WD:N), but should instead get items on the Commons Wikibase which is being created expressly for the purpose of holding structured data on things which really only have a commonswiki significance, and are not real-world notable. A second point relates to Magnus's issue about how much of this could be replaced by queries. Yes, if one were progressively building up a topic search on images from books in the 1-million image BL Mechanical Curator release, one might ask for books about London, then books published in a particular date range. But within that, the natural query to specify scans from this particular copy of 'Metropolitan Improvements' is the image's membership of this particular set -- membership of the set in itself is something that should be queryable, and such a query is the kind of query that, at the right stage, should be offerable to the user trying to refine their search. In fact, most current Commons categories will not be WD-notable. But even for the most egregious of Commons intersection categories, IMO it will still be worth the Commons Wikibase tracking category membership for an image, not least for the ability that will give to easily present the category's files in different ways -- eg perhaps
[Wikidata-l] Commons Categories again (was Re: Commons Wikibase)
Hi everybody, Sorry to open up an old thread again after ten days, but there were some things in Lydia's reply below that I wanted to come back to. So, first, a couple of examples of the kind of Commons Categories I had in mind: https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Library_Images_Online https://commons.wikimedia.org/wiki/Category:Metropolitan_Improvements_%281828%29_Thomas_Hosmer_Shepherd Despite their names, both these cats effectively identify images from particular photosets on Flickr. The first category relates to a particular set of images released by a particular institution on a particular date. The second relates to a particular set of scans from a particular edition of a particular book. Both (IMO) would (and, moreover *should*) currently fail Wikidata:Notability. The book, and even the edition, might be notable. But a particular set of scans surely would not. Similarly, the first category is really just a photoset from Flickr, again something that wouldn't currently get a Wikidata Q-number. Now in the email below, Lydia effectively said: no problem, just give each Commons Category a Wikidata Q-number anyway. (Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.) GerardM, in sessions at Wikimania, also argued strongly simply for putting everything in Wikidata. But I think this would be a mistake, because IMO Wikidata:Notability is a positive virtue, which should be defended. It is *useful* to people that they can download a dump of Wikidata for their own purposes, and get real-world relevant items, rather than the dump being bloated with wiki junk. So in my opinion, Commons categories should generally *not* get Q-numbers on Wikidata (unless they pass WD:N), but should instead get items on the Commons Wikibase which is being created expressly for the purpose of holding structured data on things which really only have a commonswiki significance, and are not real-world notable. A second point relates to Magnus's issue about how much of this could be replaced by queries. Yes, if one were progressively building up a topic search on images from books in the 1-million image BL Mechanical Curator release, one might ask for books about London, then books published in a particular date range. But within that, the natural query to specify scans from this particular copy of 'Metropolitan Improvements' is the image's membership of this particular set -- membership of the set in itself is something that should be queryable, and such a query is the kind of query that, at the right stage, should be offerable to the user trying to refine their search. In fact, most current Commons categories will not be WD-notable. But even for the most egregious of Commons intersection categories, IMO it will still be worth the Commons Wikibase tracking category membership for an image, not least for the ability that will give to easily present the category's files in different ways -- eg perhaps sorted by filename; or by original creation date; or by upload date; or by uploader; or by geographical proximity... etc. Holding the category membership in the wikibase then allows people to write gadgets to sort or filter or re-present the category in multiple ways. So it's useful to have the category as an entity that can be a target for a property. But there are also reasons for a category to have an item in its own right -- because there is structured data that one may wish to associate with the category: one example would be access stats to members of the category (eg which categories in the Mechanical Curator collection have had the most file views?) -- the kind of thing of great interest to GLAMs. Many categories also contain information defining them -- for example, for the book scans category, one would want a property that this category contained scans of the particular book (pointed to by its Q-number), probably a particular edition (probably a qualifier). One might also want to associate linked data -- pointers to entries for the book in (possibly multiple) catalogues of its original host institution. So for all these reasons it may well be useful, as a matter of course, to have a container for structured information associated with each commonscat. This is why I think each and every category on Commons should have its own Commons Wikibase item, with an associated C-number. Queries are important, but I'd suggest they are best seen as an *addition* to the present category system, rather than a *replacement* for it. A particular way forward, it seems to me, might be to allow categories to be *augmented* with specific queries -- i.e. to allow rules to be specified for particular categories, so that files whose structured-data topic information matched the rules would automatically be added to the categories, alongside