@Uwe : I'm sorry if I say trivialities, but are you familiar with the Recoin tool [1] ? It seems to be quite close to what you describe, but only for the data quality dimension of completeness (or more precisely *relative* completeness) and it could perhaps serve as a model for what you are considering. It is also a good example of a data quality tool that is directly useful to editors, as it often allows them to identify and add missing statements on an item.
Regards, Ettore Rizza [1] https://www.wikidata.org/wiki/Wikidata:Recoin On Tue, 27 Aug 2019 at 21:49, Uwe Jung <jung....@gmail.com> wrote: > Hello, > > many thanks for the answers to my contribution from 24.8. > I think that all four opinions contain important things to consider. > > @David Abián > I have read the article and agree that in the end the users decide which > data is good for them or not. > > @GerardM > It is true that in a possible implementation of the idea, the aspect of > computing load must be taken into account right from the beginning. > > Please check that I have not given up on the idea yet. With regard to the > acceptance of Wikidata, I consider a quality indicator of some kind to be > absolutely necessary. There will be a lot of ordinary users who would like > to see something like this. > > At the same time I completely agree with David;(almost) every chosen > indicator is subject to a certain arbitrariness in the selection. There > won't be one easy to understand super-indicator. > So, let's approach things from the other side. Instead of a global > indicator, a separate indicator should be developed for each quality > dimension to be considered. With some dimensions this should be relatively > easy. For others it could take years until we have agreed on an algorithm > for their calculation. > > Furthermore, the indicators should not represent discrete values but a > continuum of values. No traffic light statements (i.e.: good, medium, bad) > should be made. Rather, when displaying the qualifiers, the value could be > related to the values of all other objects (e.g. the value x for the > current data object in relation to the overall average for all objects for > this indicator). The advantage here is that the total average can increase > over time, meaning that the position of the value for an individual object > can also decrease over time. > > Another advantage: Users can define the required quality level themselves. > If, for example, you have high demands on accuracy but few demands on the > completeness of the statements, you can do this. > > However, it remains important that these indicators (i.e. the evaluation > of the individual item) must be stored together with the item and can be > queried together with the data using SPARQL. > > Greetings > > Uwe Jung > > Am Sa., 24. Aug. 2019 um 13:54 Uhr schrieb Uwe Jung <jung....@gmail.com>: > >> Hello, >> >> As the importance of Wikidata increases, so do the demands on the quality >> of the data. I would like to put the following proposal up for discussion. >> >> Two basic ideas: >> >> 1. Each Wikidata page (item) is scored after each editing. This score >> should express different dimensions of data quality in a quickly >> manageable >> way. >> 2. A property is created via which the item refers to the score >> value. Certain qualifiers can be used for a more detailed description >> (e.g. >> time of calculation, algorithm used to calculate the score value, etc.). >> >> >> The score value can be calculated either within Wikibase after each data >> change or "externally" by a bot. For the calculation can be used among >> other things: Number of constraints, completeness of references, degree of >> completeness in relation to the underlying ontology, etc. There are already >> some interesting discussions on the question of data quality which can be >> used here ( see https://www.wikidata.org/wiki/Wikidata:Item_quality; >> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc). >> >> Advantages >> >> - Users get a quick overview of the quality of a page (item). >> - SPARQL can be used to query only those items that meet a certain >> quality level. >> - The idea would probably be relatively easy to implement. >> >> >> Disadvantage: >> >> - In a way, the data model is abused by generating statements that no >> longer describe the item itself, but make statements about the >> representation of this item in Wikidata. >> - Additional computing power must be provided for the regular >> calculation of all changed items. >> - Only the quality of pages is referred to. If it is insufficient, >> the changes still have to be made manually. >> >> >> I would now be interested in the following: >> >> 1. Is this idea suitable to effectively help solve existing quality >> problems? >> 2. Which quality dimensions should the score value represent? >> 3. Which quality dimension can be calculated with reasonable effort? >> 4. How to calculate and represent them? >> 5. Which is the most suitable way to further discuss and implement >> this idea? >> >> >> Many thanks in advance. >> >> Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>) >> www.archivfuehrer-kolonialzeit.de/thesaurus >> >> >> _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata