Hi,

On 2012-12-19 15:12, Gregor Hagedorn wrote:
In addition to a storage option of the desired unit prefix (this may
be considered a original-prefix, since naturally re-users may wish to
reformat this).

I see no point in storing the unit used for input.

I think you plan to store the unit (which would be meter), so you
don't want to store prefixes, correct?

Please argue why you don't see a point. You want to both the size of
the universe, distance to New York, size of the proton in "meter"? If
not, with which algorithm will you restore the SI prefix, or rather,
recognize with SI-prefix is usable? We do not use Mm in common
language, so we do give the circumference of the earth as roughly 40
000 km and not as 40 Mm. We don't write 4*10^7 m either.

I assume there's a table for usual units for different purposes. E.g. altitudes are displayed in m and ft. Out of that one of those is chosen by the user's locale setting. My locale-setting would be kind of "metric system", therefore it will be displayed in m on my wikidata-surface. On enwiki it will probably be displayed in ft.

it is probably necessary to store the number of
significant decimals.

That's how Denny proposed to calculate the default accuracy. If the accuracy is
given by a complex model (e.g. a gamma distribution), then it might be handy to
have a simple value that tells us the significant digits.

Hm... perhaps it's best to always express accuracy as "+/-n", and allow for more
detailed information (standard deviation, whatever) as *additional* information
about the accuracy (could be modelled as a qualifier internally).

I fear that is two separate levels of precision of giving a measure of
measurement _precision_ (I believe "accuracy" is the wrong term here,
precision and accuracy are related but distinct concepts). So 4.10
means that the last digit is significant, i.e. the best estimate is at
least between 4.095 and 4.105 (but it may be better). . 4.10 +/- 0.005
means it is precisely 4.095 and 4.105, as opposed to 4.10 +/- 0.004,
4.10 +/- 0.003,  4.10 +/- 0.002 etc.

My suggestion would be:

* Somebody types in 4.10, so 4.10 will be saved. There is no accuracy available so n/a is been saved for the accuracy or even the javascript way could be used, which will be undefined (because not mentioned). Retrieving this will result in 4.10 or {value:4.10}.

* Somebody types in 4.1 with an accuracy of 0.05. So 4.1 will be saved and an accuracy of 0.05. Anybody who wants to retrieve this will get 4.1 or {value:4.1, accuracy:0.05}. Retrieving this with precision 3 will result in 4.100 or {value:4.100, accuracy:0.05}.

Futhermore, a quantity may be given as 4.10-4.20-4.35. The precision
of measurement and the the measure of variance and dispersion are
separate concepts.

Hm, somewhere in the scope of mechanical engineering there are also existing ±-values where the tolerances up and down differ from each other. E.g: it should be 11.2, but it may be 11.1 or 11.35.


I believe in the user interface this needs not
be any visible setting, simply the number of digits can be preserved.
Without these is impossible to store and reproduce information  like
"10.20 nm", it would be returned as 1.02 10^-8 m.

No, it would return using whatever system of measurement the user has selected
in their preferences.

then you have lost the information. There is no "user selection" in
this in science.

Lengths, distances, sizes, etc. are measured in meters, that's how science would do it. Displaying is totally apart from that.

Complex heuristic
may "guess" when to use the scientific SI prefixes instead. The
trailing zero cannot be reproduced however when completely relying on
IEEE floating-point.

We'll need heuristics to pick the correct secondary unit (e.g. nm or km). The

(I believe there is no such thing as a "secondary unit", did you make
that term up? Only "m" is a unit of measurement, the n or k are
prefixes see http://en.wikipedia.org/wiki/SI_prefix )

(Actually it's not a real unit but k is a step of 1000, so let's call it internally a "secondary unit", maybe more like a unity.)

general rule could be to pick a unit so that the actual value is between 1 and
10, with some additional rules for dealing with cultural specialities (decimeter
is rarely used, hectoliter however is pretty common. The decagram is commonly
used in Austria only, etc).

You would need to also know which prefix is applicable to which unit
in which context. In a scientific context different prefixes are used
than in a lay context. In a lay context astronomical temperatures may
be given as degree celsius, in a scientific as kelvin. This is not
just a user preference.

I agree that the system should allow explicit conversion in infoboxes.
I disagree that you should create an artifical intelligence system for
wikidata that knows more about unit usage than the authors. To store
the wisdom of authors, storing both unit and original unit prefix is
necessary.

If somebody enters 32 degrees Farenheit it should be stored as 273.15 Kelvin. In German Wikipedia it will be displayed as 0 degrees Celsius, on the English as 32 degrees Fahrenheit and on Wikidata, whatever the user wants it to be displayed.

You write "The Precision can be derived from the accuracy and vice
versa, using appropriate heuristics."

I _terrible strongly_ doubt that. Can you give any proof of that? For
precision I can use statistics, for accuracy and need an indirect,
separate and precise method to estimate accuracy. If you have a
laser-distance measurement device, the precision can be estimated by
yourself by repeated measurements at various times, temperatures, etc.
But unless you have an objective distance standard, you have no means
to determine whether the accuracy of the device is always off by 10 cm
because someone screwed up the software program inside the device.

But they are not the same. IMHO, the accuracy should always be stored with the
value, the precision never.

I fear that is a view of how data in a perfect world should be known,
not a reflection of the kind of data that people need to store in
Wikidata. Very often only the precision will be known or available to
its authors, or worse, the source may not say which it is.

I think this is kind of Wikidata definitions. Since years now precision is used for the amount of digits behind the comma. Now we need another word for expressing how accurate a value is. Therefore: Do we have a glossary?

Cheers

Marco

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to