Re: [Wikidata-l] Data values

Marco Fleckinger Fri, 21 Dec 2012 02:24:58 -0800


Gregor Hagedorn <g.m.haged...@gmail.com> schrieb:


>> So, please suggest terms to use for at least these two things:
>>
>> 1) value certainty (ideally, not using "digits", but something that
>is
>> independent of unit and rendering)
>
>Here we want to talk about something that the true value is with a
>certain probability within a given interval, something like: "2.3
>+/-0.2 µm"
>
>I am not too sure here myself. Different terms exist whether you talk
>about an inherent measurement error of a single individual with a
>single true value, or whether you speak of statistical measures or
>estimates.
>
>Marco gives yet another example: "We want to specify the "limits of
>(possible) variation" of a value, which would be Engineering
>tolerance. E.g. the value of electrical resistances, capacitors, etc.
>are measured in Ω ± % or F ± %. We could also either use/allow/display
>absolute or relative values." -- In this case, it is actually not a
>uncertainty of the actual sample of resistors, but a design
>specification, i.e. the specification that resistors must be (all or
>only 95%?) within _at least_ these limits.
>
>So what to do here?
>
>List the different use cases of a value plus-minus other values?
>* measurement-method limited precision range of single measurements
>(e.g.small structures in light microscope, limited by resolution
>capability of blue light, approx. 0.2 µm)
>* measurement-method limited accuracy range (or accuracy plus
>precision)
>* Confidence interval for mean (or other statistical parameters: mode,
>variance, etc.) of the population as estimated based on a sample
>* one of potentially several percentiles (incl. +- s.d.) measuring
>spread, but giving no information about the probability that the true
>mean is between these values
>* engineering design specifications that a given (unknown) fraction of
>individuals must be within these limits
>I believe for the moment you don't want to go into certainty in the
>sense that a number is an estimate of a
>
>
>All these different concepts have rightly so different names. There can
>be:
>* precision +/- 0.2
>* accuracy +/- 0.2
>* tolerance +/- 0.2
>* error margin +/- 0.2
>* +/- 1 or 2 s.d. +/- 0.2
>* 95% confidence interval (CI) +/- 0.2
>* 10 to 90% percentile  +/- 0.2
>* uncertainty (of what?) +/- 0.2
>
>(ASIDE: the +/1 2 s.d. defines roughly a 95% probability that the next
>value from a random sample is in the interval, the 95% CI that the
>true value of the mean is in that interval. These are completely
>different things -- for the same measurements you can report validly
>100 +/- 50 for the first and 100 +-0.001 for the second. That is, with
>probability 95% the next randomly sampled measurement will be between
>50 and 150, and with probability 95% it is known that the true mean is
>between 99.999 and 100.001. Semantic matters, not only the "pattern"
>of plus-minus a value.)
>
>
>Because of the widely varying use cases listed above, I believe we
>need very neutral labels for the plus-minus values if the data type
>shall simple provide two "variables" in a generic sense, the true
>semantics of which are then provided by qualifier information.
>
>I could think of something:
>* lower range (lowerRange) and upper range (upperRange).
>* lower/upper interval value/endpoint
>but I don't very much like this because it would force people to
>abandon the plus/minus notation and calculate actual values.
>
>Better may be something like:
>* upwardsAbsolute
>* downwardsAbsolute
>* upwardsPercent
>* downwardsPercent
>or
>* plusValueAbsolute
>* minusValueAbsolute
>* plusValuePercent
>* minusValuePercent
>*
>as neutral terms - but I would be glad if someone comes up with other
>neutral terms.
>
>
>However, I hope we start realizing that all of us seem to look at this
>primarily from only one of the use cases listed above (me included, I
>usually have cases with variance spread or CI of mean). We should stop
>using terms that are specific to one but not the other of the cases.
>The assumption "these things are all more or less the same" is not
>true. A confidence interval is neither a manufacturing tolerance nor a
>measurement precision. And precision is not accuracy, etc.
>
>
>
>
>
>> 2) output exactness (here, the number of digits is actually what we
>want to talk
>> about)
>
>xsd:totalDigits or Wikipedia: significantDigits or significantFigures
>
>that is one way to express value exactness, albeit a course on.
>
>Marco writes: "Everywhere in the realm of software development
>precision is used for this. Therefore also here the suggestion of
>precision was not that bad."
>
>-> In software development, the term is about the precision of the
>numeric data type, i.e. the precision of the storage mechanism. The
>term precision is correctly applied here. However, we talk about the
>actually significant digits of a measurement, which are part of the
>potential information on precision and accuracy of the value. The
>measured value with e.g. 6 digits may be stored in a data type which
>has a precision of 16 digits. I think applying "precision" to
>significant digits is and produces a fundamental misunderstand of what
>precision is, see the Wikipedia topic on precision and accuracy.
>
Hm the second one is only relevant for output. Why not using the Term 
outputformat as a pattern just like Excel, OpenOffice, and LibreOffice do? This 
could include the number of digits behind the comma, the optional 
accuracy/whatever and the unit. This will be fine for the API, and the 
MW-Syntax.

Cheers

Marco



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Data values

Reply via email to