Jheald added a comment.

Interesting slide-show. But the fundamental problem -- as some of the attached tickets start to appreciate -- is that the key information that determines whether an image fits the specification being looked for is not going to be stored in statements just on the CommonsData item for the image in question, nor just on the Wikidata item for the thing it depicts, but is going to depend on statements distributed throughout the database.

Take for instance the example raised above of Category:Grade I listed buildings in Bedfordshire.

All that you would be expecting to get on a particular image would be eg: depicts : St Andrew's church, Ampthill (Q17528295), perhaps with a qualifier shown with features: tower or shown with features: porch

But the data on the image is not going to tell you (or perhaps rather: should not be telling you) that the building is in Ampthill, or that the building is a church, or that it is Grade I listed. Instead, that will be stored (as a variety of properties) on the Wikidata item Q17528295.

It's not a question of just adding the right "depicts" and the right qualifiers on the CommonsData item for the image, and being able to search for them -- that's not where this information lives.

Even the statements on the Wikidata item Q17528295 will not tell you directly whether the item fits the specification. Q17528295 will tell you that the item is in Ampthill, but it will be another item that tells you that Ampthill is in Central Bedfordshire, and another item that tells you that Central Bedfordshire is in Bedfordshire. Similarly Q17528295 will say that the item depicted is a church, but it is only a whole chain of further items that establish that a church is a kind of building.

The expectation was that some kind of faceted search system would be needed, for the search system to be able to match the existing capability to drill down to categories like Category:Grade I listed buildings in Bedfordshire. -- so that one might first specify that one was looking for a building, and be presented with up to 2000 images of buildings; then one would be offered to either refine the type of building, or choose from a list of the most common properties that items for buildings might have, then to input or choose or refine the value for that property etc etc. -- or mark it for exclusion.

In this way, given the information already on Q17528295, just the statement on the image depicts: Q17528295 should be enough to make the image findable from a search "depicts a Grade I listed building in Bedfordshire", refined through the interface.

That was the pitch for structured data, anyway. But is it realistic? What is the kind of level of demand that might be anticipated? How long might such searches take? Are there particular cacheing or optimisation or pre-computation or indexing strategies, that could help? Do they need to be designed in from the start? These are surely questions to have at least a back-of-an-envelope feel for now, well before the purely image data may start to become available in December.


TASK DETAIL
https://phabricator.wikimedia.org/T191633

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Cparle, Jheald
Cc: Jheald, Ramsey-WMF, Cparle, Aklapper, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, V4switch, LawExplorer, Susannaanas, Wong128hk, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Matanya, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to