Gregor Hagedorn <g.m.haged...@gmail.com> schrieb:
>> So, please suggest terms to use for at least these two things: >> >> 1) value certainty (ideally, not using "digits", but something that >is >> independent of unit and rendering) > >Here we want to talk about something that the true value is with a >certain probability within a given interval, something like: "2.3 >+/-0.2 µm" > >I am not too sure here myself. Different terms exist whether you talk >about an inherent measurement error of a single individual with a >single true value, or whether you speak of statistical measures or >estimates. > >Marco gives yet another example: "We want to specify the "limits of >(possible) variation" of a value, which would be Engineering >tolerance. E.g. the value of electrical resistances, capacitors, etc. >are measured in Ω ± % or F ± %. We could also either use/allow/display >absolute or relative values." -- In this case, it is actually not a >uncertainty of the actual sample of resistors, but a design >specification, i.e. the specification that resistors must be (all or >only 95%?) within _at least_ these limits. > >So what to do here? > >List the different use cases of a value plus-minus other values? >* measurement-method limited precision range of single measurements >(e.g.small structures in light microscope, limited by resolution >capability of blue light, approx. 0.2 µm) >* measurement-method limited accuracy range (or accuracy plus >precision) >* Confidence interval for mean (or other statistical parameters: mode, >variance, etc.) of the population as estimated based on a sample >* one of potentially several percentiles (incl. +- s.d.) measuring >spread, but giving no information about the probability that the true >mean is between these values >* engineering design specifications that a given (unknown) fraction of >individuals must be within these limits >I believe for the moment you don't want to go into certainty in the >sense that a number is an estimate of a > > >All these different concepts have rightly so different names. There can >be: >* precision +/- 0.2 >* accuracy +/- 0.2 >* tolerance +/- 0.2 >* error margin +/- 0.2 >* +/- 1 or 2 s.d. +/- 0.2 >* 95% confidence interval (CI) +/- 0.2 >* 10 to 90% percentile +/- 0.2 >* uncertainty (of what?) +/- 0.2 > >(ASIDE: the +/1 2 s.d. defines roughly a 95% probability that the next >value from a random sample is in the interval, the 95% CI that the >true value of the mean is in that interval. These are completely >different things -- for the same measurements you can report validly >100 +/- 50 for the first and 100 +-0.001 for the second. That is, with >probability 95% the next randomly sampled measurement will be between >50 and 150, and with probability 95% it is known that the true mean is >between 99.999 and 100.001. Semantic matters, not only the "pattern" >of plus-minus a value.) > > >Because of the widely varying use cases listed above, I believe we >need very neutral labels for the plus-minus values if the data type >shall simple provide two "variables" in a generic sense, the true >semantics of which are then provided by qualifier information. > >I could think of something: >* lower range (lowerRange) and upper range (upperRange). >* lower/upper interval value/endpoint >but I don't very much like this because it would force people to >abandon the plus/minus notation and calculate actual values. > >Better may be something like: >* upwardsAbsolute >* downwardsAbsolute >* upwardsPercent >* downwardsPercent >or >* plusValueAbsolute >* minusValueAbsolute >* plusValuePercent >* minusValuePercent >* >as neutral terms - but I would be glad if someone comes up with other >neutral terms. > > >However, I hope we start realizing that all of us seem to look at this >primarily from only one of the use cases listed above (me included, I >usually have cases with variance spread or CI of mean). We should stop >using terms that are specific to one but not the other of the cases. >The assumption "these things are all more or less the same" is not >true. A confidence interval is neither a manufacturing tolerance nor a >measurement precision. And precision is not accuracy, etc. > > > > > >> 2) output exactness (here, the number of digits is actually what we >want to talk >> about) > >xsd:totalDigits or Wikipedia: significantDigits or significantFigures > >that is one way to express value exactness, albeit a course on. > >Marco writes: "Everywhere in the realm of software development >precision is used for this. Therefore also here the suggestion of >precision was not that bad." > >-> In software development, the term is about the precision of the >numeric data type, i.e. the precision of the storage mechanism. The >term precision is correctly applied here. However, we talk about the >actually significant digits of a measurement, which are part of the >potential information on precision and accuracy of the value. The >measured value with e.g. 6 digits may be stored in a data type which >has a precision of 16 digits. I think applying "precision" to >significant digits is and produces a fundamental misunderstand of what >precision is, see the Wikipedia topic on precision and accuracy. > Hm the second one is only relevant for output. Why not using the Term outputformat as a pattern just like Excel, OpenOffice, and LibreOffice do? This could include the number of digits behind the comma, the optional accuracy/whatever and the unit. This will be fine for the API, and the MW-Syntax. Cheers Marco _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l