[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2019-08-29 Thread Elitre
Elitre added a comment.


  @SandraF_WMF is this still an epic or is any of this already happening 
perhaps? Thanks.

TASK DETAIL
  https://phabricator.wikimedia.org/T180113

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Elitre
Cc: Elitre, Musebrarian, Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, 
Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, 
darthmon_wmde, DannyS712, Nandana, JKSTNK, Lahi, Gq86, E1presidente, 
Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, QZanden, Tramullas, Acer, 
LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Morgankevinj, 
Susannaanas, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, 
Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2018-07-05 Thread Elitre
Elitre added a comment.
I am removing our team from this for now, please file the usual request for support in the future if necessary. :)TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ElitreCc: Elitre, Musebrarian, Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, Gq86, E1presidente, Ramsey-WMF, Cparle, GoranSMilovanovic, Ivana_Isadora, QZanden, Tramullas, Acer, LawExplorer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Mbch331, Keegan___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-12-13 Thread Perhelion
Perhelion added a comment.
Categories on Commons are a really big stuff (many gadgets are be there).Such main change would be very critical, I'm very skeptical (but open for new technologies).TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: PerhelionCc: Perhelion, PDrouin-WMF, zhuyifei1999, FDMS, Jmmuguerza, Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-11-27 Thread Steinsplitter
Steinsplitter added a comment.
During the IRC meeting we talked about tags, which will be in addition to category's.

I don't think the goal should be to remove category's because there is a wide consensus and even a policy on how they should be used:


https://commons.wikimedia.org/wiki/Commons:Categories (and relevant sub-gudelines/policies)
https://commons.wikimedia.org/wiki/Commons:Categories_for_discussion/Archive


So likely the goal should be to make category's structured (so we can re-use them) and allow translation those titles, etc. 
:-)TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SteinsplitterCc: Steinsplitter, Multichill, Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-11-21 Thread Jheald
Jheald added a comment.
Hi Magnus,

I am intrigued by the idea of the categorisation information being directly accessible in the file's wikibase page; and I presume the template hack to add a category statement on the File page would also keep the SQL tables up to date, which so many tools, as well as the category presentation infrastructure depend on.  Code would need to be written to intercept new categories being added/changed/rewritten on the page, either by humans or tools, to make sure that this was routed appropriately to the wikibase.

However, I don't buy the idea of 'draining' the categories as information becomes accessible by SPARQL.  I think this would go down very badly with Commons.  At the minimum I think there's going to have to be a long period of parallel running between the category system and SPARQL-driven searches, during which the category system will need to be kept intact.  Indeed, I suspect they will still continue to have some important roles even when SPARQL is fully implemented and well populated.

So, rather than removing category statements on the file items, instead better I think would be a qualifier to indicate that the categorisation entry could be accounted for by statements on the file item.   It would be good if to some extent this could be updated by bot, as categorisations were added/revised.

As you have noted above, the translation of the meaning(s) of a category into statements can be very varied.  I don't know whether you would agree, but I believe it would be *extremely* helpful to be able to store the main "machine meanings" of categories in some accessible place, where it could be easily edited by all-comers (humans and machines) and accessed by all-comers.  (The "category combines" statement on Wikidata is a good example of how this information might be modelled).

I've suggested to Sandra that by far the best way to do this would be to have a wikibase entry for each category -- it would be easily accessible, easily writable, easily inspectable with tools we substantially already have.   I think it would also be a very good platform for live-testing some of the Structured Data technology at scale -- eg multi-content revisions, federation, etc -- in a known environment, not subject to the progress with the more involved designs for the file pages.  I'd be very interested to have your opinion on that.  I know via Sandra that the project is very wary of adding anything to the roadmap,  but it seems to me it might well pay for itself as a useful test platform down the line, and I'd be curious as to whether you'd think it would add that much of an additional requirement, given that all the enabling technology appears now to either already be in place, or to main-line for the project development.TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JhealdCc: Jheald, Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-11-10 Thread Magnus
Magnus added a comment.
Trying to decompile all these into statements and/or checks:

The year 1191 AD


depicts


or


creation date


or


creation date of depicted object (?)


Ethanol


depicts


George Washington

Could be


depicts
creator
owner


Maps of the United Kingdom


instance of:map
depicts(?):UK


Bibliothèque Nationale MS Fr. 2646


depicts: [new item]
instance of:manuscript


Pol Fruit

Could be [new item "Pol Fruit"]:


depicts
creator
owner


The New Orleans Bee May 1874


publication (or something): [new item "The New Orleans Bee"]
publication date: May 1874


Composers from Denmark


Too abstract to add direct statements
Check that all files have creators, and that all those creators are from Denmark


Addax nasomaculatus in Jerusalem Biblical Zoo


depicts: Addax nasomaculatus
location: Jerusalem Biblical Zoo


Disease incidence maps of the United Kingdom


instance of: Disease incidence map
depicts(?): United Kingdom


Files from Internet Archive Book Images Flickr stream


imported from: [new item "Internet Archive Book Images Flickr stream"]


Photographs taken on 2016-08-01, Uploaded with Mobile/Web


creation date: 2016-08-01
upload path(?): Mobile/Web


CC-BY-SA-2.0


licence (of file): CC-BY-SA-2.0


Media with locations


obsolete, convert locations from template and/or EXIF to statements for all files
maybe useful as one-off check: Once all location statements are created, all these files should have one, highlight if not


Mérimée with PA parameter


obsolete, should become statement using a Mérimée property (or some genetic "external ID" prop)


I would humbly suggest the following approach to resolve these:


Create a "category" property (plain string) on Commons SD (and maybe a "sort order" qualifier statement as well)
Add a (partially) filled (or even blank) template to each file on Commons, that renders each category statement value as a [[Category:]] link
For each file, add all categories (that are not in templates) as a statement, then remove the category from the wikitext


The file description page should now render exactly as before, and all categories should work as before, but


categories and other statements (on Commons SD and Wikidata alike) can now be queried together via SPARQL
a tool/_javascript_ can perform a single action (e.g. "remove all category statements with this name", "add creator:Michelangelo") to the results of a SPARQL query (maybe via QuickStatements)
[on second thought, one can already do that in PetScan, but maybe an "official" way would be nicer...]
that way, information contained in the category name can be added as new statements
once the information is accessible via SPARQL query, the category statements are no longer necessary and can be removed (or at least deprecated)


my 2 eurocentTASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMF, MagnusCc: Magnus, Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-11-09 Thread SandraF_WMF
SandraF_WMF added a comment.
Low priority; this is definitely on my radar, but not something I will spend many hours on in Q2 of 2017-18 (Oct-Dec 2017).TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMFCc: Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T180113: Support the creation and use of volunteer tools that help to convert information in Commons categories to structured data

2017-11-09 Thread SandraF_WMF
SandraF_WMF added a comment.
A first rough breakdown of some types of categories we are dealing with. Corrections and additions welcome.

Without fully structured data, categories on Wikimedia Commons have been the best vehicle to 'tag' media files on Commons and to organize them. The Commons category system is multihierarchical (i.e. it's a tree structure and each 'node/branch' in the tree can have multiple parents and children).
A lot of (often detailed) information is stored in Commons categories. We want to lose as little of this informational value as possible and want to work towards the best transition of this information to structured data.

Wikimedia Commons contains roughly 6,066,000 categories. (source 1) (source 2) (checked on November 9, 2017)

1. Categories with purely informational value

1.1. Simple categories about single (or combined) topics, with already (some) connection to structured data


Approximately 1,867,000 Wikidata items (source) are linked to a Commons category via the P373 property ('Commons Category'). (Wikidata query which produces a sample of 500 of such categories)


Examples:


(Year, single topic) The year 1191 AD
(Subject, single topic) Ethanol
(Person, single topic) George Washington
(Combined topic, corresponding with categories in other Wikimedia projects) Maps of the United Kingdom


1.2. Very specific categories, not connected to structured data

1.2.1 Probably notable enough to deserve a Wikidata item


Bibliothèque Nationale MS Fr. 2646  (Notable manuscript still without Wikipedia articles or Wikidata item)
Pol Fruit (A lesser-known manuscript illuminator from Flanders)


1.2.2 Probably not notable enough to deserve a Wikidata item


The New Orleans Bee May 1874 (scans of a month's newspaper issues)


1.3 Intersection categories (combining various topics)


Used for structural purposes: Composers from Denmark
Combining several notable topics: Addax nasomaculatus in Jerusalem Biblical Zoo
Another example: Disease incidence maps of the United Kingdom


2. Categories with (or including) administrative and maintenance functions


With information about the source of the file: Files from Internet Archive Book Images Flickr stream
With EXIF-like information: Photographs taken on 2016-08-01, Uploaded with Mobile/Web
With copyright-related information: CC-BY-SA-2.0
Maintenance categories: Media with locations,  Mérimée with PA parameter
TASK DETAILhttps://phabricator.wikimedia.org/T180113EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SandraF_WMFCc: Aklapper, SandraF_WMF, Lahi, PDrouin-WMF, E1presidente, Ramsey-WMF, GoranSMilovanovic, Ivana_Isadora, QZanden, Acer, Jseddon, FloNight, Trizek-WMF, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331, Keegan, Elitre, Qgil___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs