Hello,

Very interesting idea. Just to feed the discussion, here is a very recent
literature survey on data quality in Wikidata:
https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf
https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf

Cheers,

Ettore Rizza



On Sat, 24 Aug 2019 at 13:55, Uwe Jung <jung....@gmail.com> wrote:

> Hello,
>
> As the importance of Wikidata increases, so do the demands on the quality
> of the data. I would like to put the following proposal up for discussion.
>
> Two basic ideas:
>
>    1. Each Wikidata page (item) is scored after each editing. This score
>    should express different dimensions of data quality in a quickly manageable
>    way.
>    2. A property is created via which the item refers to the score value.
>    Certain qualifiers can be used for a more detailed description (e.g. time
>    of calculation, algorithm used to calculate the score value, etc.).
>
>
> The score value can be calculated either within Wikibase after each data
> change or "externally" by a bot. For the calculation can be used among
> other things: Number of constraints, completeness of references, degree of
> completeness in relation to the underlying ontology, etc. There are already
> some interesting discussions on the question of data quality which can be
> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>
> Advantages
>
>    - Users get a quick overview of the quality of a page (item).
>    - SPARQL can be used to query only those items that meet a certain
>    quality level.
>    - The idea would probably be relatively easy to implement.
>
>
> Disadvantage:
>
>    - In a way, the data model is abused by generating statements that no
>    longer describe the item itself, but make statements about the
>    representation of this item in Wikidata.
>    - Additional computing power must be provided for the regular
>    calculation of all changed items.
>    - Only the quality of pages is referred to. If it is insufficient, the
>    changes still have to be made manually.
>
>
> I would now be interested in the following:
>
>    1. Is this idea suitable to effectively help solve existing quality
>    problems?
>    2. Which quality dimensions should the score value represent?
>    3. Which quality dimension can be calculated with reasonable effort?
>    4. How to calculate and represent them?
>    5. Which is the most suitable way to further discuss and implement
>    this idea?
>
>
> Many thanks in advance.
>
> Uwe Jung  (UJung <https://www.wikidata.org/wiki/User:UJung>)
> www.archivfuehrer-kolonialzeit.de/thesaurus
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to