TLDR:  it would be useful ; but extreme hard to create rules for every
domains.

>4. How to calculate and represent them?

imho:  it is deepends of the data domain.

For geodata ( human settlements/rivers/mountains/... )  ( with GPS
coordinates ) my simple rules:
- if it has a  "local wikipedia pages" or  any big
lang["EN/FR/PT/ES/RU/.."]  wikipedia page ..  than it is OK.
- if it is only in "cebuano" AND outside of "cebuano BBOX" ->  then ....
this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX ->  this is lower
quality
- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
- geodata without GPS coordinate ->  ...
- ....
so my rules based on wikipedia pages and languages areas ...  and I prefer
wikidata - with local wikipedia pages.

This is based on my experience - adding Wikidata ID concordances to
NaturalEarth ( https://www.naturalearthdata.com/blog/ )


>5. Which is the most suitable way to further discuss and implement this
idea?

imho:  Loading the wikidata dump to the local database;
and creating
- some "proof of concept" quality data indicators.
- some "meta" rules
- some "real" statistics
so the community can decide it is useful or not.



Imre







Uwe Jung <jung....@gmail.com> ezt írta (időpont: 2019. aug. 24., Szo,
14:55):

> Hello,
>
> As the importance of Wikidata increases, so do the demands on the quality
> of the data. I would like to put the following proposal up for discussion.
>
> Two basic ideas:
>
>    1. Each Wikidata page (item) is scored after each editing. This score
>    should express different dimensions of data quality in a quickly manageable
>    way.
>    2. A property is created via which the item refers to the score value.
>    Certain qualifiers can be used for a more detailed description (e.g. time
>    of calculation, algorithm used to calculate the score value, etc.).
>
>
> The score value can be calculated either within Wikibase after each data
> change or "externally" by a bot. For the calculation can be used among
> other things: Number of constraints, completeness of references, degree of
> completeness in relation to the underlying ontology, etc. There are already
> some interesting discussions on the question of data quality which can be
> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>
> Advantages
>
>    - Users get a quick overview of the quality of a page (item).
>    - SPARQL can be used to query only those items that meet a certain
>    quality level.
>    - The idea would probably be relatively easy to implement.
>
>
> Disadvantage:
>
>    - In a way, the data model is abused by generating statements that no
>    longer describe the item itself, but make statements about the
>    representation of this item in Wikidata.
>    - Additional computing power must be provided for the regular
>    calculation of all changed items.
>    - Only the quality of pages is referred to. If it is insufficient, the
>    changes still have to be made manually.
>
>
> I would now be interested in the following:
>
>    1. Is this idea suitable to effectively help solve existing quality
>    problems?
>    2. Which quality dimensions should the score value represent?
>    3. Which quality dimension can be calculated with reasonable effort?
>    4. How to calculate and represent them?
>    5. Which is the most suitable way to further discuss and implement
>    this idea?
>
>
> Many thanks in advance.
>
> Uwe Jung  (UJung <https://www.wikidata.org/wiki/User:UJung>)
> www.archivfuehrer-kolonialzeit.de/thesaurus
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to