I think that's a fine threshold, but it will probably vary some per the type of 
data. Our goal is ultimately that every claim will have a reference, and then 
we can sort of pass the burden of accuracy on to the references. Wikipedia has 
grown well with that principle. Adding references will be much easier when we 
batch import data from sources dedicated to a particular type of data as 
opposed to parsing information out of Wikipedia (the IMDb has more consistency 
checks than Infobox film). 

Date: Tue, 2 Apr 2013 10:39:22 -0400
From: tfmor...@gmail.com
To: wikidata-l@lists.wikimedia.org
Subject: Re: [Wikidata-l] Running "Infobox film" import script

On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale <hale.michael...@live.com> wrote:




It will definitely have some errors, but I scanned the results for the first 
100 movies before I started importing them, and I think the value-add will be 
much greater than the number of errors.

Does Wikidata have a quality goal or error rate threshold?  For example, 
Freebase has a nominal quality goal of 99% accuracy and this is the metric that 
new data loads are judged against (they also want to be in the 95% confidence 
interval, which determines how big a sample you need when doing evaluations).

I haven't looked at this bot, but a develop/test/deploy cycle measured in hours 
seems, on the surface, to be very aggressive.
Tom 


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l                         
                  
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to