Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

Ettore RIZZA Wed, 28 Aug 2019 01:55:54 -0700

@Uwe : I'm sorry if I say trivialities, but are you familiar with the
Recoin tool [1] ? It seems to be quite close to what you describe, but only
for the data quality dimension of completeness (or more precisely *relative*
completeness) and it could perhaps serve as a model for what you are
considering. It is also a good example of a data quality tool that is
directly useful to editors, as it often allows them to identify and add
missing statements on an item.


Regards,

Ettore Rizza

[1] https://www.wikidata.org/wiki/Wikidata:Recoin



On Tue, 27 Aug 2019 at 21:49, Uwe Jung <jung....@gmail.com> wrote:

> Hello,
>
> many thanks for the answers to my contribution from 24.8.
> I think that all four opinions contain important things to consider.
>
> @David Abián
> I have read the article and agree that in the end the users decide which
> data is good for them or not.
>
> @GerardM
> It is true that in a possible implementation of the idea, the aspect of
> computing load must be taken into account right from the beginning.
>
> Please check that I have not given up on the idea yet. With regard to the
> acceptance of Wikidata, I consider a quality indicator of some kind to be
> absolutely necessary. There will be a lot of ordinary users who would like
> to see something like this.
>
> At the same time I completely agree with David;(almost) every chosen
> indicator is subject to a certain arbitrariness in the selection. There
> won't be one easy to understand super-indicator.
> So, let's approach things from the other side. Instead of a global
> indicator, a separate indicator should be developed for each quality
> dimension to be considered. With some dimensions this should be relatively
> easy. For others it could take years until we have agreed on an algorithm
> for their calculation.
>
> Furthermore, the indicators should not represent discrete values but a
> continuum of values. No traffic light statements (i.e.: good, medium, bad)
> should be made. Rather, when displaying the qualifiers, the value could be
> related to the values of all other objects (e.g. the value x for the
> current data object in relation to the overall average for all objects for
> this indicator). The advantage here is that the total average can increase
> over time, meaning that the position of the value for an individual object
> can also decrease over time.
>
> Another advantage: Users can define the required quality level themselves.
> If, for example, you have high demands on accuracy but few demands on the
> completeness of the statements, you can do this.
>
> However, it remains important that these indicators (i.e. the evaluation
> of the individual item) must be stored together with the item and can be
> queried together with the data using SPARQL.
>
> Greetings
>
> Uwe Jung
>
> Am Sa., 24. Aug. 2019 um 13:54 Uhr schrieb Uwe Jung <jung....@gmail.com>:
>
>> Hello,
>>
>> As the importance of Wikidata increases, so do the demands on the quality
>> of the data. I would like to put the following proposal up for discussion.
>>
>> Two basic ideas:
>>
>>    1. Each Wikidata page (item) is scored after each editing. This score
>>    should express different dimensions of data quality in a quickly 
>> manageable
>>    way.
>>    2. A property is created via which the item refers to the score
>>    value. Certain qualifiers can be used for a more detailed description 
>> (e.g.
>>    time of calculation, algorithm used to calculate the score value, etc.).
>>
>>
>> The score value can be calculated either within Wikibase after each data
>> change or "externally" by a bot. For the calculation can be used among
>> other things: Number of constraints, completeness of references, degree of
>> completeness in relation to the underlying ontology, etc. There are already
>> some interesting discussions on the question of data quality which can be
>> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
>> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>>
>> Advantages
>>
>>    - Users get a quick overview of the quality of a page (item).
>>    - SPARQL can be used to query only those items that meet a certain
>>    quality level.
>>    - The idea would probably be relatively easy to implement.
>>
>>
>> Disadvantage:
>>
>>    - In a way, the data model is abused by generating statements that no
>>    longer describe the item itself, but make statements about the
>>    representation of this item in Wikidata.
>>    - Additional computing power must be provided for the regular
>>    calculation of all changed items.
>>    - Only the quality of pages is referred to. If it is insufficient,
>>    the changes still have to be made manually.
>>
>>
>> I would now be interested in the following:
>>
>>    1. Is this idea suitable to effectively help solve existing quality
>>    problems?
>>    2. Which quality dimensions should the score value represent?
>>    3. Which quality dimension can be calculated with reasonable effort?
>>    4. How to calculate and represent them?
>>    5. Which is the most suitable way to further discuss and implement
>>    this idea?
>>
>>
>> Many thanks in advance.
>>
>> Uwe Jung  (UJung <https://www.wikidata.org/wiki/User:UJung>)
>> www.archivfuehrer-kolonialzeit.de/thesaurus
>>
>>
>> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

Reply via email to