[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Elitre added a comment. @SandraF_WMF is this still an epic or is any of this already happening perhaps? Thanks. TASK DETAIL https://phabricator.wikimedia.org/T180113 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Elitre Cc: Elitre, Musebrarian, Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, darthmon_wmde, DannyS712, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Morgankevinj, Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Elitre added a comment. I am removing our team from this for now, please file the usual request for support in the future if necessary. :)TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ElitreCc: Elitre, Musebrarian, Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, Gq86, E1presidente, Ramsey-WMF, Cparle, GoranSMilovanovic, Ivana_Isadora, QZanden, Tramullas, Acer, LawExplorer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Mbch331, Keegan___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Perhelion added a comment. Categories on Commons are a really big stuff (many gadgets are be there).Such main change would be very critical, I'm very skeptical (but open for new technologies).TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: PerhelionCc: Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Steinsplitter added a comment. During the IRC meeting we talked about tags, which will be in addition to category's. I don't think the goal should be to remove category's because there is a wide consensus and even a policy on how they should be used: https://commons.wikimedia.org/wiki/Commons:Categories (and relevant sub-gudelines/policies) https://commons.wikimedia.org/wiki/Commons:Categories_for_discussion/Archive So likely the goal should be to make category's structured (so we can re-use them) and allow translation those titles, etc. :-)TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SteinsplitterCc: Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Jheald added a comment. Hi Magnus, I am intrigued by the idea of the categorisation information being directly accessible in the file's wikibase page; and I presume the template hack to add a category statement on the File page would also keep the SQL tables up to date, which so many tools, as well as the category presentation infrastructure depend on. Code would need to be written to intercept new categories being added/changed/rewritten on the page, either by humans or tools, to make sure that this was routed appropriately to the wikibase. However, I don't buy the idea of 'draining' the categories as information becomes accessible by SPARQL. I think this would go down very badly with Commons. At the minimum I think there's going to have to be a long period of parallel running between the category system and SPARQL-driven searches, during which the category system will need to be kept intact. Indeed, I suspect they will still continue to have some important roles even when SPARQL is fully implemented and well populated. So, rather than removing category statements on the file items, instead better I think would be a qualifier to indicate that the categorisation entry could be accounted for by statements on the file item. It would be good if to some extent this could be updated by bot, as categorisations were added/revised. As you have noted above, the translation of the meaning(s) of a category into statements can be very varied. I don't know whether you would agree, but I believe it would be *extremely* helpful to be able to store the main "machine meanings" of categories in some accessible place, where it could be easily edited by all-comers (humans and machines) and accessed by all-comers. (The "category combines" statement on Wikidata is a good example of how this information might be modelled). I've suggested to Sandra that by far the best way to do this would be to have a wikibase entry for each category -- it would be easily accessible, easily writable, easily inspectable with tools we substantially already have. I think it would also be a very good platform for live-testing some of the Structured Data technology at scale -- eg multi-content revisions, federation, etc -- in a known environment, not subject to the progress with the more involved designs for the file pages. I'd be very interested to have your opinion on that. I know via Sandra that the project is very wary of adding anything to the roadmap, but it seems to me it might well pay for itself as a useful test platform down the line, and I'd be curious as to whether you'd think it would add that much of an additional requirement, given that all the enabling technology appears now to either already be in place, or to main-line for the project development.TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JhealdCc: Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
Magnus added a comment. Trying to decompile all these into statements and/or checks: The year 1191 AD depicts or creation date or creation date of depicted object (?) Ethanol depicts George Washington Could be depicts creator owner Maps of the United Kingdom instance of:map depicts(?):UK Bibliothèque Nationale MS Fr. 2646 depicts: [new item] instance of:manuscript Pol Fruit Could be [new item "Pol Fruit"]: depicts creator owner The New Orleans Bee May 1874 publication (or something): [new item "The New Orleans Bee"] publication date: May 1874 Composers from Denmark Too abstract to add direct statements Check that all files have creators, and that all those creators are from Denmark Addax nasomaculatus in Jerusalem Biblical Zoo depicts: Addax nasomaculatus location: Jerusalem Biblical Zoo Disease incidence maps of the United Kingdom instance of: Disease incidence map depicts(?): United Kingdom Files from Internet Archive Book Images Flickr stream imported from: [new item "Internet Archive Book Images Flickr stream"] Photographs taken on 2016-08-01, Uploaded with Mobile/Web creation date: 2016-08-01 upload path(?): Mobile/Web CC-BY-SA-2.0 licence (of file): CC-BY-SA-2.0 Media with locations obsolete, convert locations from template and/or EXIF to statements for all files maybe useful as one-off check: Once all location statements are created, all these files should have one, highlight if not Mérimée with PA parameter obsolete, should become statement using a Mérimée property (or some genetic "external ID" prop) I would humbly suggest the following approach to resolve these: Create a "category" property (plain string) on Commons SD (and maybe a "sort order" qualifier statement as well) Add a (partially) filled (or even blank) template to each file on Commons, that renders each category statement value as a [[Category:]] link For each file, add all categories (that are not in templates) as a statement, then remove the category from the wikitext The file description page should now render exactly as before, and all categories should work as before, but categories and other statements (on Commons SD and Wikidata alike) can now be queried together via SPARQL a tool/_javascript_ can perform a single action (e.g. "remove all category statements with this name", "add creator:Michelangelo") to the results of a SPARQL query (maybe via QuickStatements) [on second thought, one can already do that in PetScan, but maybe an "official" way would be nicer...] that way, information contained in the category name can be added as new statements once the information is accessible via SPARQL query, the category statements are no longer necessary and can be removed (or at least deprecated) my 2 eurocentTASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMF, MagnusCc: Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
SandraF_WMF added a comment. Low priority; this is definitely on my radar, but not something I will spend many hours on in Q2 of 2017-18 (Oct-Dec 2017).TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMFCc: Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data
SandraF_WMF added a comment. A first rough breakdown of some types of categories we are dealing with. Corrections and additions welcome. Without fully structured data, categories on Wikimedia Commons have been the best vehicle to 'tag' media files on Commons and to organize them. The Commons category system is multihierarchical (i.e. it's a tree structure and each 'node/branch' in the tree can have multiple parents and children). A lot of (often detailed) information is stored in Commons categories. We want to lose as little of this informational value as possible and want to work towards the best transition of this information to structured data. Wikimedia Commons contains roughly 6,066,000 categories. (source 1) (source 2) (checked on November 9, 2017) 1. Categories with purely informational value 1.1. Simple categories about single (or combined) topics, with already (some) connection to structured data Approximately 1,867,000 Wikidata items (source) are linked to a Commons category via the P373 property ('Commons Category'). (Wikidata query which produces a sample of 500 of such categories) Examples: (Year, single topic) The year 1191 AD (Subject, single topic) Ethanol (Person, single topic) George Washington (Combined topic, corresponding with categories in other Wikimedia projects) Maps of the United Kingdom 1.2. Very specific categories, not connected to structured data 1.2.1 Probably notable enough to deserve a Wikidata item Bibliothèque Nationale MS Fr. 2646 (Notable manuscript still without Wikipedia articles or Wikidata item) Pol Fruit (A lesser-known manuscript illuminator from Flanders) 1.2.2 Probably not notable enough to deserve a Wikidata item The New Orleans Bee May 1874 (scans of a month's newspaper issues) 1.3 Intersection categories (combining various topics) Used for structural purposes: Composers from Denmark Combining several notable topics: Addax nasomaculatus in Jerusalem Biblical Zoo Another example: Disease incidence maps of the United Kingdom 2. Categories with (or including) administrative and maintenance functions With information about the source of the file: Files from Internet Archive Book Images Flickr stream With EXIF-like information: Photographs taken on 2016-08-01, Uploaded with Mobile/Web With copyright-related information: CC-BY-SA-2.0 Maintenance categories: Media with locations, Mérimée with PA parameter TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMFCc: Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs