TLDR: it would be useful ; but extreme hard to create rules for every domains.
>4. How to calculate and represent them? imho: it is deepends of the data domain. For geodata ( human settlements/rivers/mountains/... ) ( with GPS coordinates ) my simple rules: - if it has a "local wikipedia pages" or any big lang["EN/FR/PT/ES/RU/.."] wikipedia page .. than it is OK. - if it is only in "cebuano" AND outside of "cebuano BBOX" -> then .... this is lower quality - only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX -> this is lower quality - only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality - geodata without GPS coordinate -> ... - .... so my rules based on wikipedia pages and languages areas ... and I prefer wikidata - with local wikipedia pages. This is based on my experience - adding Wikidata ID concordances to NaturalEarth ( https://www.naturalearthdata.com/blog/ ) >5. Which is the most suitable way to further discuss and implement this idea? imho: Loading the wikidata dump to the local database; and creating - some "proof of concept" quality data indicators. - some "meta" rules - some "real" statistics so the community can decide it is useful or not. Imre Uwe Jung <jung....@gmail.com> ezt írta (időpont: 2019. aug. 24., Szo, 14:55): > Hello, > > As the importance of Wikidata increases, so do the demands on the quality > of the data. I would like to put the following proposal up for discussion. > > Two basic ideas: > > 1. Each Wikidata page (item) is scored after each editing. This score > should express different dimensions of data quality in a quickly manageable > way. > 2. A property is created via which the item refers to the score value. > Certain qualifiers can be used for a more detailed description (e.g. time > of calculation, algorithm used to calculate the score value, etc.). > > > The score value can be calculated either within Wikibase after each data > change or "externally" by a bot. For the calculation can be used among > other things: Number of constraints, completeness of references, degree of > completeness in relation to the underlying ontology, etc. There are already > some interesting discussions on the question of data quality which can be > used here ( see https://www.wikidata.org/wiki/Wikidata:Item_quality; > https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc). > > Advantages > > - Users get a quick overview of the quality of a page (item). > - SPARQL can be used to query only those items that meet a certain > quality level. > - The idea would probably be relatively easy to implement. > > > Disadvantage: > > - In a way, the data model is abused by generating statements that no > longer describe the item itself, but make statements about the > representation of this item in Wikidata. > - Additional computing power must be provided for the regular > calculation of all changed items. > - Only the quality of pages is referred to. If it is insufficient, the > changes still have to be made manually. > > > I would now be interested in the following: > > 1. Is this idea suitable to effectively help solve existing quality > problems? > 2. Which quality dimensions should the score value represent? > 3. Which quality dimension can be calculated with reasonable effort? > 4. How to calculate and represent them? > 5. Which is the most suitable way to further discuss and implement > this idea? > > > Many thanks in advance. > > Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>) > www.archivfuehrer-kolonialzeit.de/thesaurus > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata