Re: [Wikidata-l] Tree Of Life

2014-12-19 Thread Gregor Hagedorn
Thanks Lydia,

agreed, the tool absolutely is useful and many thanks to you, Lucie and
your team, for providing it

However, it is difficult for me to say what needs to be resolved on the
tools level and what on Wikidata content. What I observe looks to me (but I
may be wrong) a mashup-error occurring when uncritically combining data
from different sources, each source being internally consistent and
correct. This may be something that needs to be addressed by the tool.

The problem with different labels and singular/plural may or may not be a
tool problem in choosing the correct label

One problem I have when using the tool to try to understand what is going
on: It should variously wikidata and wikipedia pages, making it somewhat
difficult to follow what goes on. E.g. Biota is a wikidata page, bacteria
is a wikipedia page...

So again thanks for doing this great work!

gregor




On 18 December 2014 at 20:52, Lydia Pintscher lydia.pintsc...@wikimedia.de
wrote:

 On Thu, Dec 18, 2014 at 8:38 PM, Gregor Hagedorn g.m.haged...@gmail.com
 wrote:
  https://tools.wmflabs.org/tree-of-life/
 
  is problematic at first looks.
 
  Bacteria, prokaryotes, monera and eukaryoates as sister groups on the
 same
  level?
  Prokaryotes contain as only subtaxon Archaea (but no bacteria)?
 
  Also, the mixed use of scientific and common names (variously as
 singular or
  plural) is rather confusing.
 
  sorry for the critique...

 Hey Gregor,

 That's exactly why Lucie made the tool ;-) It is only a reflection of
 the data in Wikidata. So if it is wrong in the tree it is wrong in
 Wikidata and should be fixed.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Product Manager for Wikidata

 Wikimedia Deutschland e.V.
 Tempelhofer Ufer 23-24
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l



-- 

---
Dr. Gregor Hagedorn
Head of Digital World and Information Science
Museum für Naturkunde Berlin
Leibniz-Institut für Evolutions- und Biodiversitätsforschung
Invalidenstrasse 43, 10115 Berlin
+49 (0)30 2093 8576 (work)
+49-(0)30-831 5785 (private)
gregor.haged...@mfn-berlin.de
http://www.naturkundemuseum-berlin.de
http://linkedin.com/in/gregorhagedorn

This communication, together with any attachments, is intended only for the
person(s) to whom it is addressed. Redistributing or publishing it without
permission may be a violation of copyright or privacy rights.

Halloween = Christmas? 31 Oct = 25 Dec!
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Tree Of Life

2014-12-19 Thread Gregor Hagedorn

  may be wrong) a mashup-error occurring when uncritically combining data
 from
  different sources, each source being internally consistent and correct.
 This
  may be something that needs to be addressed by the tool.

 Good point, yes. Any ideas how?


Perhaps just help understanding this better, by displaying the sources for
the relations shown


  The problem with different labels and singular/plural may or may not be a
  tool problem in choosing the correct label

 It just takes the label of the item. There is a taxon name property.
 Would that be a useful alternative? Input from the experts needed.


That seems helpful. One way would be to always show taxon name first,
followed by the - at the moment perhaps inconsistent - language label in
Parentheses


 Right. It shows the Wikipedia article if there is one in the language
 you selected and otherwise Wikidata. Better to always show Wikidata?


perhaps iframe with the mobile-view as it is PLUS 2 normal (quick-) links
below, allowing to open the normal-view pages to conveniently analyse/ read
etc.

... not sure. Situation at the moment is that when I tried to understand
the source of the mashup error, I could not do it with the information or
links displays

thanks again

gregor
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Tree Of Life

2014-12-18 Thread Gregor Hagedorn
https://tools.wmflabs.org/tree-of-life/

is problematic at first looks.

Bacteria, prokaryotes, monera and eukaryoates as sister groups on the same
level?
Prokaryotes contain as only subtaxon Archaea (but no bacteria)?

Also, the mixed use of scientific and common names (variously as singular
or plural) is rather confusing.

sorry for the critique...

gregor

On 18 December 2014 at 17:35, Lucie Kaffee lucie.kaf...@wikimedia.de
wrote:

 Hey,

 Since the Tree of Life by Denny is outdated, we thought it was a nice
 idea, to have a new one, to have an overview over the biological taxonomy
 on wikidata. Not only to have a nice looking tree but also to see, where
 errors are and to correct and update it.  Right now, there are 660775 Items
 in the tree.
 Even though the change of names can be seen instantly, because this is
 based on the API, changes in the order need an update of the whole tree,
 because it's based on wikidata dumps. (This one on the most recent one from
 15.12.2014)

 Here you go, this is the new tree of life, made with a lot of love:
 https://tools.wmflabs.org/tree-of-life/

 If you have any corrections, additions or features you want to add, feel
 free to ping me, send me a mail, submit a patch or file an issue on github.
 The repo for the tree is on https://github.com/frimelle/tree-of-life

 Cheers,

 Lucie (frimelle)

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l



-- 
-
Dr. G. Hagedorn
+49 (0)30 2093 8576 (work)
+49-(0)30-831 5785 (private)
gregor.haged...@mfn-berlin.de
http://www.linkedin.com/in/gregorhagedorn

This communication, together with any attachments, is made entirely on my
own behalf and in no way should be deemed to express official positions of
my employer. It is intended only for the person(s) to whom it is addressed.
Redistributing or publishing it without permission may be a violation of
copyright or privacy rights.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] entity vs Special:EntityData

2013-06-28 Thread Gregor Hagedorn
But the problem seems to be that http://www.wikidata.org/wiki/Q1000
actually seems to be an information resource, i.e. under this URI html
content is directly being returned (rather than being http 303
redirected).

Also, checking with http://validator.linkeddata.org/vapour I received
an error about invalid response (I am not sure whether this is a
problem with Vapour or Wikidata ...)

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] entity vs Special:EntityData

2013-06-28 Thread Gregor Hagedorn
But the problem seems to be that http://www.wikidata.org/wiki/Q1000
actually seems to be an information resource, i.e. under this URI html
content is directly being returned (rather than being http 303
redirected).

Also, checking with http://validator.linkeddata.org/vapour I received
an error about invalid response (I am not sure whether this is a
problem with Vapour or Wikidata ...)

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-06 Thread Gregor Hagedorn
 This is great, but the solution I saw (i.e.
 {{#property:population|current-value=30900}}) makes the whole
 Wikidata absolutely useless.

(I asked Luca back about this, and perhaps one point is that the term
current is too easily misunderstood. The point is not that wikidata
should have such a property, but that the value at the time of saving
a past version of a wikipedia page is preserved. Perhaps

{{#property:population|value-when-saving-page=30900}}

would be less easily misunderstood. nothing in Wikidata would be made
useless by this, it would work exactly like now when calling the
current page, it would only work differently when calling the
cite-thiy-version-of-a-page links. And it would allow a wikipedia
community to structure their work such that Wikipedia editors can
still curate and see changes to the values.



However, as said above, this is just an example of a solution. It is
safe, very small processing overhead, small storage overhead and
scales well to load.

A more elegant solution would clearly be to do two things:

a) when creating a Wikipedia diff on the Wikipedia page version
history, to either show directly, or link to a Wikidata property diff
(reduced to the relevant parts as outlined in an earlier mail) in
addition to the wikitext diff of the page.

Note that it is not necessary to merge all Wikidata versions into the
Wikipedia version. When comparing two arbitraty Wikipedia page
version, it is irrelvant whether 1 or multiple Wikidata changes are
included, all corresponding changes should be shown on request. The
only necessary item is a single Wikidata indicator (operating like a
special version line) on top for cases where Wikidata properties are
changed after the last Wikipedia edit.

b) expand the property function such that for all calls of specific
(citable) page versions, it retrieves the property rendering at that
point in time from wikidata.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
On 4 April 2013 22:23, Michael Hale hale.michael...@live.com wrote:
 trench, then I just want to update it on Wikidata, and then every article
 that references it will be updated. I don't want to have to update it on
 Wikidata and then go do a null edit on every article that uses that
 information.

You are correct, the current version would have to be an exception,
and display under the current time rules just as in the
implementation. My proposal only makes sense when versions from the
history are being displayed.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
My concern is, that the Wikidata editors, those not with random
editing behavior, but those who are curators/caretakers of specific
pages, experience a disempowerment, because they loose control.

I view the decision to inform about wikidata changes only in the
short-lived recentchanges, but not in the page history, as
problematic. Page editors will now be informed that the page has
changes, but this change is not recorded in the page history, and it
cannot be seen in the version-diffs. This is breaking a lot of
assumptions of trust. Wikipedia can be be collaborative because of
this trust in the versioning system and because of the accessibility
with reasonable, of the version-diffs (transparency).

Some editors will probably leave the Wikipedia project due to the
introduction of Wikidata, no matter how much Wikidata reaches out to
them. I feel that the number is much higher in the present
disempowerment implementation, which is why I try to argue here for
making content changes that come from Wikidata and affect Wikipedia
pages transparent on Wikipedia, not only Wikidata.

This discussion is about proposing potential elements and ideas; there
may be much better ideas. I am not convinced by the arguments against
the proposed means: I fear the thinking is a programmers thinking, not
a content editor thinking. Denny, I feel that your proposal that some
html-version archiving somewhere, which is not integrated into the
wikipedia editing workflow, does not take sufficient care of the needs
of the editors, especially the need to be able to use the version
comparison, not just find rendered versions somewhere  in isolation.

But neither of us can see into the future. I think Wikidata is a great
achievement as it is, and we all agree that it can be made better by
better integration into existing Wikipedia workflows. Let us focus on
the importance of this and try to find the best means that are
achievable with existing resources.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
On 5 April 2013 20:05, Michael Hale hale.michael...@live.com wrote:
 The thing to remember is that the history of a page is the history of the
 wiki markup for the page, not the history of the rendered HTML. It would be
 misleading if edits were shown in the markup history for an article each
 time a template or Wikidata item changed because reverting the markup to
 that version wouldn't actually revert the change. I think what curators with
 specific specialties want is the ability to automatically expand their
 watchlist to include all templates and data items that could affect their
 watched pages. Then a way to view the merged watchlists from multiple
 projects would be helpful. There is room for improvement in global account
 integration. For example, I just noticed that I need to set my timezone on
 Wikipedia and Wikidata independently.

I partly agree, the ideal situation is that
a) changes of wikidata (and perhaps templates, and perhaps images,
with decreasing necessity in practice) show in the page history
b) in the diff, such changes are shown separately from the changes of
the wikitext itself, but with the same action. This can be achieved by
showing the affected changes after a separation line below the
wikitext diff.

However, since this was rejected previsously as undoable, the
expansion of {{#property: to include the current value would be a
work-around.

We perhaps disagree about the priorities. I believe Wikipedia editors
are not primarily keen on the technical definition of the diff as the
changes of the wikitext of the database. I believe they want and need
transparency about when an who changed a specific topic they care
about.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
On 5 April 2013 21:41, Michael Hale hale.michael...@live.com wrote:
 Well, I could make a view that shows the diff of a Wikipedia article stacked
 on top of the diff for the corresponding Wikidata item on my computer in a
 few minutes. But diffs can be very long sometimes, so there would be a lot
 of discussion about whether that view is more appropriate than just making
 it easier to find the link to the Wikidata item on the diff and edit pages
 for articles that include Wikidata properties. But the work-around you
 suggest is not a work-around. If the U.S. article currently says: The
 population of the U.S. is 309,000,000 people. And I change it to say The
 population of the U.S. is {{#property:population|current-value=30900}}
 people. Which scenarios does that improve?

It makes changes visible in the present wikitext diff and allows to
render the past version easily to a correct value. It is not meant as
a perfect solution to all, but as an option for an editing community
to enable them to chose between keeping the information visible and
traceable in the Wikidata diff. I agree about the disadvantages of
clutter, but the point is that is does allow a community to choose
that they don't like clutter and don't need the history and diffs, and
simple create the property functions inside the templates (with no
tracking). This would indeed show the changes in the present diff and
it would allow several years into the future to reasonably understand
(in wikitext) and render (as html) previous states of wikipedia
articles several years into the past.

However, better solutions are certainly possible. Based on what you
write I can imagine an editing workflow diff similar to the stacking
of diffs, but actually reduced to a link pointing to Wikidata. The
features I view as important are:

1. In the history, Wikidata changes for a topic are made visiable
durign the the display of Wikipedia-Wikitext changes. Ideally multiple
Wikidata edits could be merged into a single line if no Wikitext
changes occur in between. There could be options to hide Wikidata
changes. The how is not so important, but I think the present
watchlist implementation is insufficient, because it generates an
attention message, but makes it hard to follow up (which usually
occurs in the page history +  diff).

2. In the actual diff, instead of stacking the Wikitext + Wikidata
property diffs, I could imagine that a solution that at the bottom
says: Associated Wikidata properties changed in the choosen period.
Where choosen period is the period chosen for the diff by the editor
and where the whole is linked to a Wikidata diff. Present Wikipedia
communication practice heavily relies on pointing to specific history
diffs (through links), but currently Wikidata changes are completely
invisible there. By automatically linking them in present practice
could continue smoothly and non-disruptive.

3. Ideally, the Wikidata diff link should have option to hide the
Wikidata internal properties like changes in item labels, and should
show language specific changes only for a specific language, to keep
the attention of content editors focussed on the relevant changes
(most likely Wikidata will still show more properties than those used
on the wikipedia page, but this could be acceptable).



The question of rendering the html for past versions is separate. You
seem to say that it is already easy to write the #property: function
such that it takes into account the edit timestamp of a wikipedia page
version and evaluates the property as it was at that point in time
(with the current version (ie. when called without a pageid) always
evaluating to now(), not the last editing timestamp.

ASIDE: I don't worry too much if a property is being referenced by
name or ID in a past version and has since been deleted, the resulting
error message is transparent rather than misleading (which the display
of wrong information is).

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
On 5 April 2013 23:19, Michael Hale hale.michael...@live.com wrote:
 So you agree that it is more important to reduce clutter than to add
 functionality that very few people use?

No, I strongly disagree with this. I think the functionality of being
able to curate the page Wikipedia editors care for should have highest
priority. I believe it is very important to make Wikidata palatable to
the people Wikipedia depends upon. Reducing clutter is nice, but
avoiding to loose transparency and trust is essential.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-05 Thread Gregor Hagedorn
On 5 April 2013 23:53, Michael Hale hale.michael...@live.com wrote:
 But you do agree that it is easier to curate articles by updating one value
 in a database than updating the value separately everywhere it appears?

Absolutely.

But in my experience Wikipedia editors care about the product of a
readable, intelligable, correct encyclopedic article that others enjoy
to read. People that are able to care about individual properties in
Wikidata are rare. Wikidata needs the coupling between Wikipedia
editors and Wikidata curation. The editors should be supported, not
alienated by giving them the feeling that it becomes unmanageable for
them to follow the changes (because of workflow separation, because of
too many insigificant changes (like label changes in any number of
languages that the average editor is unable to read).

I view support of Wikipedia editor workflow, for which the implemented
change notification in recentchanges/watchlist is an important first
step, with support of change transparency discussed here a second
step, as an important piece in the whole puzzle.

(Not as the only important thing, don't get me wrong :-) )

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-04 Thread Gregor Hagedorn
 when templates (or, in the case of wikidata, properties) get deleted or 
 renamed.
 Nobody has come up with a good solution yet.

I think we did discuss a simple, working solution: Saving the value
together with the Wikipedia page.

The major argument against that was: it is a waste of storage to
create a new Wikipedia page (perhaps daily) when property values
included in a page are changed in Wikidata. I personally value trust
and documentation of change much higher than disk storage, but even
then, there are ways to balance this. So perhaps a modified proposal
that matches the current development stage:

If an editor saves a page with {{#property:population}} the parser
looks up the current value and changes this to:
  {{#property:population|current value=2348732}}
and stores this wikitext version in the Wikipedia. The same would
apply to updating, saving {{#property:population|current
value=2348732}} may result in {{#property:population|current
value=2348700}} being saved.

This would mean no additional waste of storage for articles that are
regularly changed. For those that are not, one could imagine a
bot-based monthly update check to make past knowledge transparent.

I realize that this would require a pattern, where the
Wikidata-derived values would remain editable on the topic/article
pages, i.e. the property function would have to be inserted in the
template call, rather than in the template definition. Those wikidata
properties automatically called inside templates with a dynamic item
decided by the current template call would not be preserved. However,
both editing patterns would be available and it would be up to the
community of each Wikipedia to choose the preferred one.

(As I said previously: although similar to the issue of commons images
and templates, the issue at stake for Wikidata is different. Because
of the problems in preserving a transparent editing history, updates
to commons images are generally restricted to truly minor improvements
(contrast, cropping, better resolution, etc.). I am not aware of
cases, where commons images regularly are replaced with updated
content that is different in substance and thus automatically changes
all Wikipedia pages, representing different knowledge. I don't want to
exclude this, but even for changing company logos the usual solution
is to create a new name, preserving the old logo. Similarly, templates
may fail to work in old versions (big problem!), but I am not aware
that a template would render out-of-time information when viewing a
past revision. Thus, the problem of Wikidata with respect to
endangering the trust basis of Wikipedia, the version system, is
related, but different).

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Page history and properties

2013-04-04 Thread Gregor Hagedorn
 don't see what value we'd gain from storing that extra metadata. Every
 scenario I can think of where you care about past states of the database is
 already handled by the compare selected revisions feature.

If that is so simple, can the {{#property:xxx}} call in a wikipedia
simply resolve to the revision that was valid at the point in time
equivalent to a given revision? It seem like you say you already have
the code to do that when creating the wikidata item description.

I disgree that this is an issue for mediawiki core, since it is a
question of how the Wikidata-specific property function works.

Gregor


PS: I admit that Denny has found an example to where an image seems to
be changing in content on commons, but I still believe this is a rare
case. Any wiki-statistician that can supply exact number for these
cases?

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] some good news about the future of Wikidata

2013-02-20 Thread Gregor Hagedorn
2013/2/20 Lydia Pintscher lydia.pintsc...@wikimedia.de:
 http://blog.wikimedia.de/2013/02/20/the-future-of-wikidata/

Very good, congratulations!

With great respect for your work,

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Fwd: Todos for RDF export

2013-01-29 Thread Gregor Hagedorn
Some of our insights into the SMW RDF export (which we found to be
difficult to configure and use):

1. Probably most relevant: total lack of support for xml:lang, which would
have been essential to our purposes.

Wikidata should be planned with support for language in mind.

2. We also found that we had serious problems with managing structure, e.g.
record and subobject. Due to the need to obtain this information
recursively by repeated calls, and because there is no control on the URI
created for these calls, some easy solutions like applying clean-up xslt
will not work. This may not be relevant for wikidata.

3. At first the lack of variable datatype (datatype is fixed per property)
is acceptable. However, we found this a major problem with respect to the
forced distinction between datatype:wiki-page and datatype:global URI
properties. Essentially, SMW forces one to introduce for a semantic
property (e.g. dc:creator) two distinct dummy properties:
property:creator_page and property:creator_uri. Since in RDF export the
artificial distinction between pages and URIs disappears, it would be
desirable to merge them, but only one of them can be set to an imported
vocabulary.

I think this may be relevant to wikidata, where a similar distinction
between properties pointing to a local wikidata item and a global resource
exists.

Gregor

(PS: If any of the problems above in reality does not exist in SMW and we
simply overlooked the solution, I am very happy for corrections, of course!)



-- 
-
Dr. G. Hagedorn
+49-(0)30-8304 2220 (work)
+49-(0)30-831 5785 (private)
http://www.linkedin.com/in/gregorhagedorn
https://profiles.google.com/g.m.hagedorn/about

This communication, together with any attachments, is made entirely on my
own behalf and in no way should be deemed to express official positions of
my employer. It is intended only for the person(s) to whom it is addressed.
Redistributing or publishing it without permission may be a violation of
copyright or privacy rights.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Coordinate datatype -- update

2013-01-18 Thread Gregor Hagedorn
 In order not to loose the Dim-data that is already available from the
 Wikipedias, and to use this for scaling. It should really only describe the
 rough dimension. I would expect that a building would still have something
 like area or similar in its own property. Dimension is used for scaling
 and uncertainty.

Dimension of a building, locality, etc. is well understood, it is the
size of location, without respect to shape. Location uncertainty is
well understood. Confounding the two seems to introduce the inability
of interpreting the information afterwards for many downstream
processing cases. I would plead to support both dimension and
uncertainty or none. --gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Is the inclusion syntax powerful enough?

2013-01-10 Thread Gregor Hagedorn
I like it.

For multilingual wikis like commons a presumably fairly simple
extension it might be valuable to support besides #property also
#propertylabel and #itemlabel, only with the id and of parameter. I
think this does not really to complexity much.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Update to time and space model

2013-01-08 Thread Gregor Hagedorn
ON COORDINATES:

a) what you describe is more specific than a geolocation (which may be
expressed by other means than coordinates). I suggest to give the data
type the more specific name:

geocoordinates

b) with respect to precision: I don't understand the reasoning to
stick this to degrees. Since we are describing locations on an
ellipsoid, the longitude to distance and latitude to distance
conversions are different, and they are different for different points
on earth. See example on en.wikipedia, a minute at equator is 1843
versus 1855 m.

In practice the potential location error will be given in a distance
measure. You want to convert it to degrees in a highly complex
conversion. Why? The back conversion will usually be non-ambiguous
(since the backconversion will always describe an ellipsis rather than
a circle).

c) Furthermore, as before, I believe that precision and accuracy will
usually both contribute to the error your are interested in and which
is typically described in geolocations having a +/- addition.

I suggest to replace precision with
errorradius
or
uncertaintyradius
or
uncertaintyInMeters

which would be the great circle distance. To somewhat simplify, the
unit could be fixed to m.

Here is some work done in our area (biodiversity):
http://code.google.com/p/darwincore/wiki/Location

The term there is http://terms.gbif.org/wiki/dwc:coordinateUncertaintyInMeters

d) the correct name for globe is Geodetic datum or geodetic
system (which is more than the globe). See
http://en.wikipedia.org/wiki/Geodetic_system or
http://terms.gbif.org/wiki/dwc:geodeticDatum. WGS 84 (as a wikidata
item) is a valid geodetic datum or system. Both terms are equally
correct. Globe is not correct.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Update to time and space model

2013-01-08 Thread Gregor Hagedorn
 geocoordinates
 Yep, agreed. Or just coordinates.

yes, probably better without a geo if it shall work for moon or mars as well.

However, http://en.wikipedia.org/wiki/Coordinate_system is far broader
term. But I cannot find a correct superclass term for
Geographic/Selenographic/Martiographic(?) coordinates.
http://en.wikipedia.org/wiki/Spherical_coordinate_system may be close,
but its beyond my competence to decide.

Coordinates should be pragmatic enough. Or LocationCoordinates.

 In practice the value will be given as 44°15'. Then we know it is by the
 minute - and not that it is given by a nautical mile. I am not making a
 highly complex conversion -- I am just looking at the number and saying oh
 yeah, this seems to be given by the minute, and not by the second or by the
 degree.

 The reason why I prefer degrees on a given equator to meters is that it
 makes more sense on varying globes, like the Earth, Moon, Sun, Jupiter, and
 Phobos. What we need is the possibility to understand that 44°15' should not
 be displayed as 44°15'00.001 the next time the value is displayed. And by
 saying it is correct by the minute allows us to do so. Making the statement
 in meters would actually require us to make that complex calculation which
 would be based on the given geodetic system -- which is much more
 complicated than the current suggestion.

you try to solve the problem of reproducing the precision of the
number as entered. However, the proposed mechanism is a mechanism of
uncertainty, which is far more general, able to express the
uncertainty radius that is due to e.g. specific GPS technologies. When
reading the proposal, I did not even understand your narrower
intention in your proposal.

I believe it does not work to simple use a an equator based
distance-in-m to degree conversion. See
http://commons.wikimedia.org/wiki/Commons:Geocoding#Precision for
examples how this changes in moderate latitudes, not to speak of being
near the pole.

My Conclusions:

a) the model may be able to express the number of digits in
degree-minute-second, decimal degrees, and degree-decimal-minutes. I
believe, however, that it is yet underdefined. The value in precision
necessary to specify whether a decimal-degree-stored-value is to be
reproduced as 44°15', 44°15'15'',  44°15'15'.15', 44°15'15'.15',
44°15.1515, or , 44.151515 ° (which are different example values of
course, not just different precision) is unclear to me.

Latitude and longitude to 4-5 significant digits mean different
precision in meters, but it is customary to give the same precision.


b) the model may unduly suggests it can be used for arbitrary reasons
of precision. However, it can not ALSO capture imprecision or
uncertainty expressed as
 00°15′00″S 78°35′00″W +/- 300 m, since this requires a conversion
which is different for longitude and latitude and longitudes at
different latitudes.

That is an geocoordinate with an explicit +/- xx m uncertainty
cannot be entered in wikidata. This is an acceptable limitation, but
it should be understood and clearly stated.

In a later mail Denny writes I still would prefer Arcdegree of the
equator of the given globe over Meter, as it allows to measure any
globe without having too much details about the globe. but otherwise
it seems like the same things. (And they can be transformed from one
to the other using a simple factor). I think this is not correct.
There is no general and simple convertibility between error radius as
distance and number of significant digits in degrees.


c) if the goal is to store the number of significant digits/figures, I
suggest to store this more directly, although I admit that in the
presence of different representations (decimal degrees, DMS, etc.)
this is not trivial.

 d) the correct name for globe is Geodetic datum or geodetic
 system (which is more than the globe). See
 http://en.wikipedia.org/wiki/Geodetic_system or
 http://terms.gbif.org/wiki/dwc:geodeticDatum. WGS 84 (as a wikidata
 item) is a valid geodetic datum or system. Both terms are equally
 correct.

If it shall be applicable outside earth, geodetic datum/system may
actually be too narrow, I did not think of that.

I don't know the correct superterm then. Maybe just call it system,
and explain, that for earth it defines the geodetic or similar datum
or system, for other celestial bodys their analogues.

I am rather certain that Wikidata does not need to add a further
parameter for earth, moon, etc. as Jeroen suggests. I suggest to add
to the documentation: The Geodetic or Spatial reference system must
be chosen in such a way that it automatically implies the celestial
body to which the coordinates apply (Earth, Moon, Mars, Venus, Sun,
etc.). I almost believe that this will always be the case, since
these system must define the shape of the ellipsoid, which is
different for different celestial bodies.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org

Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
 Hm the second one is only relevant for output.

I think this is a fundamental misunderstanding: The original one is
not for output but is the primary value for interpretation, for
understanding whether a value in Wikidata is correct of fake, or a
software conversion error, or what. If I want to learn something in
Wikipedia, I have to have access to this information. Having access to
this I can understand whether to seemingly different values from the
same source are justified or an error (like in the example from
previous mail that 100 +/- 50 and 100 +/- 0.1 can be both valied for
the same quantity and the same source observations).

I view the roughly unreliably with lots of heuristics normalized
converted version the secondary. It has its uses, but I would in fact
put it second and show it only with a large warning banner that this
version contains lots of unwarranted assumptions which may or may not
hold. But I don't care which is primary or secondary, I only want to
encourage you not to forget the data in wikidata over implementing
the essential search, retrieval, conversion, etc. functionality.

:-)

In an ideal world all data would be in a fully convertible state and
no-one would simply use significant digits to express margins of
error, reliability, tolerance etc. But I have not encountered this
world yet.

 Why not using the Term outputformat as a pattern just like Excel, OpenOffice, 
 and LibreOffice do? This could include the number of digits behind the comma, 
 the optional accuracy/whatever and the unit. This will be fine for the API, 
 and the MW-Syntax.

I don't care how the information is encoded, if you develop your own
language to encode information in a string and provide a syntax for
that that is fine. Only already within Microsoft products the
formatting strings are only similar, but not fully compatible, and I
have doubts that this is a good way for global interoperability.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
 I don't like significant digits because it depends on the writing system 
 (base
 10). I'd much rather express this as absolute values.

Yes, I would like too. What I argue is that the problem is that you
simply in 99.9 % (not a researched of number of course) of cases
simply don't know more than that there is a given number of digits
base 10. Whether that is meaningful or just sloppy or even a wilfull
simplification (probably the vast majority of quantities in current
Wikipedia belong to the latter category) is unknown.

 That means that the figure is not usable for query answering at all. If we 
 don't
 know the level of certainty, we cannot use the number.

that will usually be the case. Unless you know which kind of margin
the numbers reflect, you cannot use it for answering anyways. What do
you do with the two examples:
100 +/- 50
and
100 +/- 0.1
that are the results of the same dataset and precisely reflect the
same quantity? If you know that the first is a 95% measure of
dispersion, and the second a 95% CI for the mean, you can ask people
whether they look for the mean (best estimate) or for a single
observation.

 Make the interval-points an option. If explicitly entered: excellent
 information. If not: don't try to create (false) knowledge from void.

 Yes, it will be an option. Making the default unknown would be bad though, I
 think.

The default has to reflect reality. If you make it a complication to
enter the actual default situation, and automatically add a margin of
error/dispersion/tolerance whatever then people will simply allow it
to happen, start ignoring it, don't understand it, and in the end
Wikidata will be known as a bunch of unreliable encoded information.

 However, we should probably store whether the level of certainty was given
 explicitly or estimated automatically based on the number of significant 
 digits
 - then we can still ignore automatic values when desired.

Which will force all re-users to understand this and to throw away
these values prior to any analysis...

Why so complicated?

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
On 21 December 2012 19:36,  jmccl...@hypergrove.com wrote:
 The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and
 xsd:maxExclusive facets are absolute expressions not relative +/-
 expressions, in order to accommodate fast queries. These four facets permit
 specification of ranges with an unspecified median and ranges with a
 specified mode, inclusie or exclusive of endpoints, a six-fer. For these
 reasons I believe the XSD approach is superior for specifying value set when
 compared to storing the dispersion factors themselves, eg the 3 of +/- 3.

yes, provided they are actually tied to the semantics of min. and
maximum, which the xsd examples are. As long as the semantics of the
proposed value bracketing in Wikidata is unknown, their use is
questionable if not impossible. If I know something is plus/minus 2
s.d. or plus minus 2 s.e. or 10 to 90 % percentile ... I again can use
them to the benefit of the query system. But not without.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-20 Thread Gregor Hagedorn
I believe there are a lot of dangerous assumptions on
http://simia.net/valueparser/

First: there is no indication in a number that it is _not_ endlessly precise.
Apostles = 12
has no uncertainty, representing it as
12  ± 1 is wrong, but also 12  ± 0.5 is wrong.

The same applies to a number like 12.2. The data source and author MAY
desire to express significant digits, but we simply don't know.
Wikidata should keep this at the don't know level and not
force-convert a number of unknown measurement precision to a number
with explicitly stated (but potentially totally wrong) precision or
accuracy limits.

For example, in science it is quite common to give light microscopic
measurement to one decimals behind the micrometer, even though the
precision is 0.2 µm. The latter is simply known and therefore not
constantly repeated, unless specific circumstances justify this.

As discussed above: plus minus 1 s.d. does not give you a confidence
interval for the mean, it gives you a measure of dispersion.

-

My proposal: make the default: plus-minus values unknown, only
significant digits known. The interpretation of significant digits is
not machine-available unless qualifiers say so. It can however be used
to result in an estimate of significant digits after conversion.

Make the interval-points an option. If explicitly entered: excellent
information. If not: don't try to create (false) knowledge from void.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 In addition to a storage option of the desired unit prefix (this may
 be considered a original-prefix, since naturally re-users may wish to
 reformat this).

 I see no point in storing the unit used for input.

I think you plan to store the unit (which would be meter), so you
don't want to store prefixes, correct?

Please argue why you don't see a point. You want to both the size of
the universe, distance to New York, size of the proton in meter? If
not, with which algorithm will you restore the SI prefix, or rather,
recognize with SI-prefix is usable? We do not use Mm in common
language, so we do give the circumference of the earth as roughly 40
000 km and not as 40 Mm. We don't write 4*10^7 m either.

 it is probably necessary to store the number of
 significant decimals.

 That's how Denny proposed to calculate the default accuracy. If the accuracy 
 is
 given by a complex model (e.g. a gamma distribution), then it might be handy 
 to
 have a simple value that tells us the significant digits.

 Hm... perhaps it's best to always express accuracy as +/-n, and allow for 
 more
 detailed information (standard deviation, whatever) as *additional* 
 information
 about the accuracy (could be modelled as a qualifier internally).

I fear that is two separate levels of precision of giving a measure of
measurement _precision_ (I believe accuracy is the wrong term here,
precision and accuracy are related but distinct concepts). So 4.10
means that the last digit is significant, i.e. the best estimate is at
least between 4.095 and 4.105 (but it may be better). . 4.10 +/- 0.005
means it is precisely 4.095 and 4.105, as opposed to 4.10 +/- 0.004,
4.10 +/- 0.003,  4.10 +/- 0.002 etc.

Futhermore, a quantity may be given as 4.10-4.20-4.35. The precision
of measurement and the the measure of variance and dispersion are
separate concepts.


 I believe in the user interface this needs not
 be any visible setting, simply the number of digits can be preserved.
 Without these is impossible to store and reproduce information  like
 10.20 nm, it would be returned as 1.02 10^-8 m.

 No, it would return using whatever system of measurement the user has selected
 in their preferences.

then you have lost the information. There is no user selection in
this in science.

 Complex heuristic
 may guess when to use the scientific SI prefixes instead. The
 trailing zero cannot be reproduced however when completely relying on
 IEEE floating-point.

 We'll need heuristics to pick the correct secondary unit (e.g. nm or km). The

(I believe there is no such thing as a secondary unit, did you make
that term up? Only m is a unit of measurement, the n or k are
prefixes see http://en.wikipedia.org/wiki/SI_prefix )

 general rule could be to pick a unit so that the actual value is between 1 and
 10, with some additional rules for dealing with cultural specialities 
 (decimeter
 is rarely used, hectoliter however is pretty common. The decagram is commonly
 used in Austria only, etc).

You would need to also know which prefix is applicable to which unit
in which context. In a scientific context different prefixes are used
than in a lay context. In a lay context astronomical temperatures may
be given as degree celsius, in a scientific as kelvin. This is not
just a user preference.

I agree that the system should allow explicit conversion in infoboxes.
I disagree that you should create an artifical intelligence system for
wikidata that knows more about unit usage than the authors. To store
the wisdom of authors, storing both unit and original unit prefix is
necessary.


You write The Precision can be derived from the accuracy and vice
versa, using appropriate heuristics.

I _terrible strongly_ doubt that. Can you give any proof of that? For
precision I can use statistics, for accuracy and need an indirect,
separate and precise method to estimate accuracy. If you have a
laser-distance measurement device, the precision can be estimated by
yourself by repeated measurements at various times, temperatures, etc.
But unless you have an objective distance standard, you have no means
to determine whether the accuracy of the device is always off by 10 cm
because someone screwed up the software program inside the device.

 But they are not the same. IMHO, the accuracy should always be stored with the
 value, the precision never.

I fear that is a view of how data in a perfect world should be known,
not a reflection of the kind of data that people need to store in
Wikidata. Very often only the precision will be known or available to
its authors, or worse, the source may not say which it is.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 15:11, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
 If they measure the same dimension, they should be saved using the same unit
 (probably the SI base unit for that dimension). Saving values using different
 units would make it impossible to run efficient queries against these values,
 thereby defying one of the major reasons for Wikidata's existance. I don't 
 see a
 way around this.

Daniel confirms (in separate mail) that Wikidata indeed intends to
convert any derived SI units to a common formula of base units.

Example: a quantity like 1013 hektopascal, the common unit for
meterological barometric pressure (this used to be millibar), would be
stored and re-displayed as
1.013 10^5 kg⋅m−1⋅s−2

I see several problems with this approach:

1. Many base units are little known. kg⋅m2⋅s−3⋅A−2 for Ohm... It
breaks communication with humans curating data on wikidata. It will
make it very difficult to compare data entered into wikidata for
correctness, because the data displayed after saving will have little
relation with the data entered. This makes Wikidata inherently
unsuitable for an effort like Wikipedia with many authors and the
reliance on fact checking.

2. Even for standard base units, there is often a 1:n relation. e,g,
both gray   and sievert have the same base unit. The base unit for lumen
is candela (because the steradians is not a unit, but part of the
derived unit applicability definition)

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 These all pose the same problems, correct. At the moment, I'm very unsure 
 about
 how to accommodate these at all. Maybe we can have them as custom units, 
 which
 are fixed for a given property, and can not be converted.

I think the proposal to use wikidata items for the units (that is both
base and derived SI as well as Imperial units/US customary units) is
most sensible.

Let people use the units they need. Then write software that picks up
the units that people use (after verifying common and correct use) by
means of their Wikidata item ID. With successive versions of Wikidata,
pick up more and more of these and make them available for conversion.

This way Wikidata will become what is needed.

I fear the discussion presently is about anticipating the needs of the
next years and not allowing any data into wikidata that have not been
foreseen.

There may be a way that Wikidata can have enough artifical
intelligence to predict  which unit prefixes are usable in common
topics versus scientific topics, which units shall be used. Where
Megaton is used (TNT of atomic bombs) and where 10^x ton are
preferred (shipping). And that the base unit for weight is kilogram,
but for gold in a certain value range ounce may be preferred and
gemstones and pearls in carat
(http://en.wikipedia.org/wiki/Carat_(unit) ).

But I believe forcing Wikidata to solve that problem first and
ignoring the wisdom of the users is the wrong path.

Modelling Wikidata on the feet versus meter and Fahrenheit versus
Celsius problem, where US citizens have a different personal
preference is misleading. The issue is much more complex.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 it is probably necessary to store the number of
 significant decimals.

 Yes, that *is* the accuracy value i mean.

Daniel, please use correct terms. Accuracy is a defined concept and
although by convention it may be roughly expressed by using the number
of significant figures, that is not the same concept. Without
additional information you cannot infer backwards whether usage of
significant figures expresses accuracy or precision. See
http://en.wikipedia.org/wiki/Accuracy_and_precision

 Ok, there's some terminology confusion here. I'm using accuracy to refer to
 the accuracy of measurement (e.g. standard deviation), and precision to 
 refer
 to the precision of presentation (e.g. significant digits). We need these two
 things at least, and words for them. I don't care much which words we use.

I do. And I think it is important for WIkidata to precisely express
what it wants to achieve.

Accuracy has nothing to do with s.d., which is a measure of
dispersion. You can have an accuracy of +/- 10 measured with a
precision of +/- 0.1 (and a standard deviation for the population of
objects that you have measured of 2).


-

 So 4.10
 means that the last digit is significant, i.e. the best estimate is at
 least between 4.095 and 4.105 (but it may be better). . 4.10 +/- 0.005
 means it is precisely 4.095 and 4.105, as opposed to 4.10 +/- 0.004,
 4.10 +/- 0.003,  4.10 +/- 0.002 etc.

Yes, all this should be handled by the component responsible for parsing user
input for quantity values.

But it cannot be because you have lost the information. I don't know
whether  +/- 0.005 indicates significant figures/digits or whether is
is an exact precision_or_accuracy interval.

I think this may become clearer if you consider a value entered in inches:

1.20 inches.
you convert:
1.20 +/- 0.05 in = 3.048 10^-2 m +/- 1.27 10^-3 m

If this is the only information stored, I have no information left
whether I should display 3.048 or 3.0480 and whether the information
+/- 1.27 10^-3 m is meaningful (no) or an artifact of conversion
(yes).




 It can be stored as an auxilliary data point, that is, as a qualifier 
 (measured
 in feet). It should not IMHO be part of the data value as such, because that
 would make it extremely hard to use the values in a database.

You are correct insofar that I propose you need to store two units:
the normalized one (SI units only, and no prefix - and even though the
SI base unit is kg I would store gram) and the original one plus the
original unit prefix.

If you do that, you can store the value in a single normalized unit,
provided you back-convert it prior to display in Wikidata.

I don't think the original unit is a meaningless qualifier, it is
vital information for context.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
 I'd have thought that we'd have one such table per dimension (such as length
 or weight). It may make sense to override that on a per-property basis, so
 2300m elevation isn't shown as 2.3km. Or that can be done in the template that
 renders the value.

here and in the entire discussion I fear that the need to support data
curation on Wikidata data for correctness is not sufficiently in the
focus.

If someone enters the height of a mountain in feet and I see the
converted value in meter in my wikidata preferences-converted view, I
will correct the seemingly senseless and unjustified precision to
three digits after the meter. Only if we understand in which unit the
data were originally valid, we will be able to successfully
communicate and collaborate.

Yes, Wikidata shall store a normalized version of the value, but it
also needs to store an original one. Whether it needs to store the
value twice I am not sure, I believe not. If it store the original
prefix, original unit and original significant digits, it can
generally recreate the original form. I know that there are some
pitfalls with IEEE numbers in this, and it may be safer to store the
original number as well initially (and perhaps drop it later when
enough data are available to test the effects).

Of course, Wikipedias can use the API to display the value in any
other form, just as they like, but that does not solve the problem of
data curation on wikidata (which includes the data curation by
wikipedia authors).

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
Martynas,

I think you misinterpret the thread. There is no discussion not to
build on the datatypes defined in http://www.w3.org/TR/xmlschema-2/

What we are doing is discussing compositions of elements, all typed to
xml datatypes, that shall be able to express scientific and
engineering requirements as to statistics, signficant digits (except
perhaps for duration, none of the data types in
http://www.w3.org/TR/xmlschema-2/ supports that), as well as means to
express uncertainty and confidence intervals.

Many existing xml schemata define such compositions, all squarely
built on http://www.w3.org/TR/xmlschema-2/ - wikidata is certainly not
unique in this effort. If you can point the team to further well
reviewed solutions, this would be very useful.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 20:01,  jmccl...@hypergrove.com wrote:
 Hi Gregor - the root of the misconception I likely have about significant
 digits and the like, is that such is one example of a rendering parameter
 not a semantic property.

It is about semantics, not formatting.

In science and engineering, the number of significant digits is not
used to right align numbers, but to semantically indicate the order of
magnitude of the accuracy and/or precision of a measurement or
quantity. Thus, the weight of a machine can be given as 1.2 t (exact
to +/- 50 kg), 1200 kg  (+/- 1 kg), or 1200.000 g.

This is not part of IEEE floating point numbers, which always have the
type dependent same precision or number of significant digits,
regardless whether this is semantically justified or not. IEEE 754
standard double always has about 16 decimal significant digits, i.e.
the value 1.2 tons will always be given as 1.200 tons.
This is good for calculations, but lacks the information for final
rounding.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
 It would be possible and very flexible, and certainly more powerful than the
 current system. But we would loose the convenience of having one date, which
 we need for query answering (or we could default to the lower or upper
 bound, or the middle, but all of these are a bit arbitrary).

I believe it would be more profitable to build a query system which
always queries for the range. This would work for interval-only values
(see my comment on the wiki page) as well as for value with interval.

I don't see this as a big overhead. It is more a problem for ordering,
but internally, wikidata could store a midpoint value for intervals
where no explicit central value is given, and use these for ordering
purposes.

I think it would be great if the system is consistent for quantities,
dates, geographical longitude/latitude, etc.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
 Now, I don't think we need or want ranges as a data type at all (better have
 separate properties for the beginning and end).

I am afraid this will then put a heavy burden on users to enter,
proofread, and output values. Data input becomes dispersed, because
the value 18-25 cm length  has to be split and entered separately.
You have to write a custom output for each property then, and do all
the query logic ( lower,  upper) for each property in each Wikipedia
client.

I believe this is something that is healthy to do centrally.
I believe the concept of intervals exists because of that.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
 I don't see this as a big overhead. It is more a problem for ordering,
 but internally, wikidata could store a midpoint value for intervals
 where no explicit central value is given, and use these for ordering
 purposes.

 Well, I would call that mid point simple the value, and the range would be
 the accuracy. There's an important conceptual distinction here to having 
 ranges
 as actual values.

Can this conceptually distinguish between a meaningful midpoint value,
and one that is useful for ordering, but has no meaning and should not
be displayed as a result value? See the examples on

https://meta.wikimedia.org/wiki/Talk:Wikidata/Development/Representing_values#Missing_central_value

Gregor

PS: With accuracy you introduce a new concept here which was not in
the representing values paper (see
http://en.wikipedia.org/wiki/Accuracy_and_precision). This is
different from confidence interval (uncertainty) where it is not yet
decided whether the value indicates accuracy or dispersion. Confidence
interval is a measure of Accuracy only if the sample measurements are
normally distributed and if no systematic bias exist.  --- I believe
it is important that wikidata is flexible enought so it can capture
both, especially because in many cases dispersion is used as a rough
estimate for otherwise unknown accuracy, and since in many cases there
is no true single value and the dispersion is systematic (see e.g.
car model length example).

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
 (ASIDE: Regarding presentation: it is not always algorthmically eay
 whether to present  0.01 m as 1 * 10e-14 or a 10 fm = 10 *
 10-15. In a scientific context, only the SI steps should be used, in
 another context the closest decimal may be appropriate.)

 But floating point numbers are handled by the implementation of [[IEEE
 floating-point standard]].

 Displaying the numbers is another question. There I have to agree that it
 always makes sense to also store a typical used unit for that type of data.

I agree. What I propose is that the user interface supports entering
and proofreading 10.6 nm as 10.6 plus n (= nano) plus meter.
How the value is stored in the data property, whether as 10.6 floating
point or as 1.6e-8 is a second issue -- the latter is probably
preferable. I only intend to show that scientific values are not
always trivial to reverse engineer from a floating point value to the
intended value.

In addition to a storage option of the desired unit prefix (this may
be considered a original-prefix, since naturally re-users may wish to
reformat this) it is probably necessary to store the number of
significant decimals. I believe in the user interface this needs not
be any visible setting, simply the number of digits can be preserved.
Without these is impossible to store and reproduce information  like
10.20 nm, it would be returned as 1.02 10^-8 m. Complex heuristic
may guess when to use the scientific SI prefixes instead. The
trailing zero cannot be reproduced however when completely relying on
IEEE floating-point.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata license (was Introduction and some questions on Wikidata)

2012-12-10 Thread Gregor Hagedorn
Marco wrote:
 So I assume that single facts (or database items) are not copyrightable just
 like single words. Only the database (or even a view?) as a selection and
 arrangement of various items is copyrightable.

Yes, a database may be copyrightable, if the creativity in selecting
and arranging information is copyrightable. This possibility exists in
many countries and is completely different from database rights.

To which extent case law provides a copyright protection and which
level of creativity is required varies in each state. In the EU, the
database rights directive addresses also database copyrights and tries
to harmonize it among EU member states.

However, the database right is a completely separate right. It exists
only in the EU, and it has nothing to do with the argument above.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Suggestions

2012-12-09 Thread Gregor Hagedorn
 If I'm not mistaken this is what we plan to do. The current system we're
 using for the language links can certainly handle it, as it's not WP
 specific.

that would be great!

On top of my wish list are the Wiktionaries, since most of the
entities needed to use Wikidata for descriptive knowledge are provided
only on summary pages (dozens of terms in one page) on Wikipedias,
whereas the Wiktionaries define them as pages.

And I believe it will strengthen Wikidata if it can be open to open
data initiatives outside of the Wikimedia Foundation. Clearly there
needs to be control which initiatives are accepted as valid
authoritative sources of identifiers, but I wonder whether the
interwiki list is not already a good mechanism for this? If the
interwiki list could be supplemented with a generic definition how to
make an ajax-identifier lookup call, to present the user a picklist,
this could be a huge long-term benefit (i.e. it could be used by
Wikidata, but also in any Wikipedia when using a more powerful visual
editor).

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata-l Digest, Vol 12, Issue 10

2012-11-20 Thread Gregor Hagedorn
Hi Michael,

(I cannot speak on behalf of WMF, this is not official)

The plans for Wikidata are to carefully attribute sources of data,
which comes much closer to what you need, i.e. 3rd party data are
separatable. For the start the CC0 license has been chosen which gives
you unlimited re-use rights, but other licenses may be supported later
if needed. The discussion was to decide this only when the use cases
start to come in.

The real data use is still to come, so far wikidata can really only do
the interlanguage linking of wikipedia pages. The rest is being worked
on right now.

Best

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata license (was Introduction and some questions on Wikidata)

2012-11-16 Thread Gregor Hagedorn
 Just to clarify, my concern is about externally made databases, regardless
 of whether these are imported directly into Wikidata, or have been
 incorporated into Wikipedia first and imported into Wikidata from there. For
 example, the population data in Wikipedia's list of ceremonial English
 counties
 (http://en.wikipedia.org/wiki/List_of_ceremonial_counties_of_England), which
 also features in the infoboxes of the articles on each county, would I think
 be covered by database right under U.K law. Like other ONS material, it has
 been made available under the OGL, which does impose some obligations on
 re-users (somewhat similar to CC-BY).

This is in interesting case. However, while the database right gives
you certain rights, it does not give you a copyright (i.e. the conent
may be legally problematic, but it cannot be covered by CC BY-SA).
Thus, the use on Wikipedia is either exclusively licensed with an
obligation to prevent re-use by third parties (which is not the case,
WMF does not do this), or it is illegal, or acceptance of open re-use
is an implicit waiver of database rights.

I believe you can not allow it on Wikipedia but then NOT allow further reuse.

However, to clarify:

1. It is much preferable to add such data to Wikidata and include
their source in a structured way. Whether OGL or other licenses need
to be explicitly supported by Wikidata in the future will have to be a
separate discussion, on Wikidata.org.

2. My goal in participating in this discussion is to avoid the
impression that re-use of Wikipedia content is not possible at all
without looking at each invidivual data element and record.

3. Wikidata plans to support a hierarchy of multiple data for the same
statement (multiple values from different sources for a single
property in a single item). This makes it possible (although not
required) to mix Wikipedia-harvested information with poor sourcing
with clean, well sourced data.

4. Not harvesting from Wikipedia implies to verify that almost all
information from Wikipedia is in WIkidata, but cleanly sourced, before
it si possible to migrate a class of infoboxes to Wikidata.  I believe
this is an impossible task, making some import of Wikipedia-harvested
data necessary. Where better, sourced information exist, these would
take precedence.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Introduction and some questions on Wikidata

2012-11-14 Thread Gregor Hagedorn
 First of all the priority lies on data already present on Wikipedia. Wikidata 
 should not be a data storage for everything structured in the world, so we 
 should first start to transfer data already present on Wikipedia to Wikidata.
 External data-sources will be interested as well and for sure but the purpose 
 of wikidata is still cenralizing of what we already have.

A side remark, because I believe this needs some further discussion. I
agree that Wikidata should be focusing on the kind of data Wikipedia
has or which are suitable for Wikipedia. However, since the Wikipedia
data are not systematically sourced (they may be unsourced or the
source is only available in edit comments, talk page etc.), it will be
very valuable to import relevant, sourced data that have the right
scope.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] wikidata.org is live (with some caveats)

2012-10-31 Thread Gregor Hagedorn
 Changing the language does not really work, the title of the item
 pages remain in English.
 http://www.wikidata.org/wiki/Special:RecentChanges?setlang=de
 Did it have a German label or just a language link to dewp?

Probably it did not have a manually entered label at the time. After
my post it now has. From what you write and what I tried today I
assume that your don't use the Wikipedia-page title in a given
language as default for the top line label of the wikidata page?

I realize that sometimes the Wikipedia title may not be the best final
label but it seems an excellent default.
a) a display default e.g. in recent changes. Presently many pages in
recentchanges just show a number, although they are connected to many
wp articles already.
b) as a gray editable default when editing a page.

Presently, when in editing mode, the label on top seems to be visible
only in a single language. If a label in English has already been
entered, there is no access to it, neither read nor write. So when I
go to a data page, I usually see NOTHING at the top when my language
is set to German. The only way to guess the label of a data item is in
fact to use the Wikipedia article titles.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] wikidata.org is live (with some caveats)

2012-10-30 Thread Gregor Hagedorn
Great work, my congratulations!


---
Some first impressions:

Changing the language does not really work, the title of the item
pages remain in English.

http://www.wikidata.org/wiki/Special:RecentChanges?setlang=de
-
(Unterschied | Versionen) . . Sweden (Q34); 17:48 . . (+32) . .
Aplasia (Diskussion | Beiträge) (Fügte websitespezifischen [itwiki]
Link hinzu:  Paesi Bassi)

although http://www.wikidata.org/wiki/Q34 does list de: Schweden.

I am not sure whether this is a bug or by design?

-

In German, translation of Item with Datenelement = data element
seems odd, a data element is usually something much smaller and
atomic. See http://de.wikipedia.org/wiki/Datenelement

Proposal: Artikel or Datenobjekt

-

Change  http://www.wikidata.org/ to http://wikidata.org/ ?



I think the logo image (upper left corner) looks a bit lost, 10%
larger perhaps? It is not even left aligned with the menu text below,
which has ample margin



Finally: The Q### will become the public face of Wikidata, whereever
it is re-used. I think this brand should be less cryptic and use:

http://wikidata.org/wiki/WD34 or http://wikidata.org/wiki/W34

(providing a memnonic link to Wikidata) instead of current

http://wikidata.org/wiki/Q34


Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] changing wikidata-item properties with multilingual labels

2012-08-17 Thread Gregor Hagedorn
 You are right, I mixed them up (that comes from not checking).

 The usecase for monolingual text are a bit rare, and I am thinking of
 things like official motto (which is usually not translated),

I think if it is only usually not, but sometimes indeed translated,
using multilingual for the property would be a better choice. If only
one language is available, the language fallback would always end with
this.

 etymological annotations, or the official name of a company (also,
 usually not translated),

usually. Companies sometimes do run under local names (or variations):
de: Sanyo Denki K.K.
ja: San’yō Denki Kabushiki-gaisha,
en: SANYO Electric Co. Ltd.

 carefully about how delineate them from each other in the entry forms,
 or otherwise it might end up a bit messy.

I think when adding the option for non-linguistic content (= ISO zxx)
for language-neutral entities (e.g. for scientific species names, post
codes), this type is the least needed.

(If anything, it may be more valuable to add a default flag to
indicate a primary name that should be used prior to the first in a
language fallback. This would be valuable in mixed cases, where a
string is translated in a few cases, but not in the majority of
languages (the usually not case). Else in a rare border case, where
a German company that provides translations to Japanese and Chinese,
but not to english, a language fallback chain that does contain German
may accidentally end up with Chinese. This solves a border case within
the multilingual type which I believe cannot be solved with
monolingual text.)

Gregor

(Often wrong but never in doubt :-) )

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] watching Wikidata changes that affect my wiki

2012-08-14 Thread Gregor Hagedorn
I think the topic is relevant for the Wikidata editing UI.

At the hackathon in Berlin we had discussions about a chain of
fallback languages. Have reworked and added some potential
user-interface behaviour to

http://meta.wikimedia.org/wiki/Wikidata/Notes/Language_fallback

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] changing wikidata-item properties with multilingual labels

2012-08-14 Thread Gregor Hagedorn
A city has a Wikipedia page and a corresponding Wikidata-item-page.
One of the item properties is Property:City_mayor.

If the mayor changes, and both have their own pages/items
(http://de.wikipedia.org/wiki/Eberhard_Diepgen to
http://de.wikipedia.org/wiki/Klaus_Wowereit for
http://de.wikipedia.org/wiki/Berlin), changing the mayor would mean to
disconnect/replace the item to item property. The change would be
clean and logical with respect to translated labels.

However, where the city mayor is not a well known person (smaller
cities), the  City_mayor property is mostly likely a string literal.

Replacing the string (name) for the mayor in this case would require
to empty ALL translations/transliterations in all other languages.
Unfortunately, the system cannot really know whether an update of a
translated label is the result of a correction (person did not change)
or occurs as a result of changing the label.

The design of the UI should make this situation as transparent to
editors as possible. It may help to provide two edit-buttons for
language-sensitive string literals:

[edit translations]
[edit new value] (or [replace value] ?)

In the second case, all existing translations would be blanked.
Probably more or better ideas can be found... :-)

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] changing wikidata-item properties with multilingual labels

2012-08-14 Thread Gregor Hagedorn
 But humans (and other entities) should not be represented by strings
 in the system, but by items.

I wonder whether this would not be too inflexible. It would burden the
use of wikidata with the responsibility to determine entity-identity
in all cases where only a name-string is known.

In the example of the mayor: Assume that the new mayor of a city is
named John Smith.  Wikidata already has 500 items for persons named
John Smith. The Wikipedia-Wikidata editor must now determine whether
it is good practice to simply create wikidata-item 501, not knowing
whether it is one of these or not.

I fear that the practice is even more problematic in the reverse case.
If in a large percentage of cases there is little doubt about
identify, this could lead to the practice of always connecting to a
wikidata-item for a person, should there be a person of this name.
Henceforth, Wikidata would claim that the mayor of Erewhon previously
was councilor in Owd-Negrin, even if there is only a chance identity
of a name. Wikipedia disambiguation pages know how many homonymic
highly notable persons exist - Wikidata will deal with the non- or
less-notable ones as well.

A well known example is that it is not a good idea for scientific
reference management to treat authors as person entities, since the
reverse engineering of author identity from the n:m relation between
person and name-string is normally not feasible.

I would prefer if the decision whether entity-identity is known or
whether only a name-string or other label is known, should be left to
the Wikidata editor community, and not prescribed by the software.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-05 Thread Gregor Hagedorn
 I don't mean to spin this out into a tangent about Drupal.

Me neither, my discussion point here is: There are advantages for
opaque (like http:something.org/node123456) and nonopaque
(http:something.org/Bonn,_Northrhine-Westfalia,_Germany) URI/IRI
identifiers.

In the light of the use-case of interlinking discussed here: which is
right for Wikidata? Does Wikidata need both in parallel (I believe
this is the current plan)?

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] reworked storyboard for linking Wikipedia articles

2012-06-20 Thread Gregor Hagedorn
I added some comments on

http://meta.wikimedia.org/wiki/Talk:Wikidata/Development/Storyboard_for_linking_Wikipedia_articles_v0.2

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata Transclusions

2012-06-14 Thread Gregor Hagedorn
While I agree that it is desirable to support simple, preformatted
Infoboxes that can, with minimal effort be re-used in a large number
of language versions of Wikipedia, I strongly disagree with the demand
to make this the only choice.

I think the present Wikidata approach to allow local Wikipedias to
customize their infoboxes by accessing wikidata properties
property-by-property is the right path.

The large Wikipedias with many editors have invested considerable
creative energy into making quite a large number of infoboxes
elaborate information containers. That includes formatting, images and
hand-crafted links in both the field name and the field value
side. Some values are expressed through svg graphics, other values
expressed through background color coding, etc.

Limiting the usability of Wikidata to plain vanilla infox boxes could
cause considerable resistance in these communities. And although small
Wikipedia will profit a lot from Wikidata, without the engagement of
editors from the large Wikipedias into curating Wikidata content, the
increased synergies will not happen.

Another issue is that (I believe that) Wikidata does not have a notion
of ordering properties. Correct? This is no issue for the present
Wikidata approach because infoboxes remain curated in each local
Wikipedia. However, in a centralized one size fits all approach,
replacing existing infoboxes where information is presented in a
logical order with an alphabetical property order would create huge
resistance (and would be a complex issue that Wikidata would have to
deal with, allowing property ordering and filtering).

I believe that Wikidata correctly aims to provide a smooth transition
path, where it is possible to obtain only part of an infobox from
wikidata and inject wikidata content into existing infobox layouts.

That said: I would encourage a third party contributor to try to
create a default Wikidata infobox generator in a way (extension
installable in multiple Wikipedias) that enables a wikipedia to
autocreate a good looking, plain vanilla infobox with minimal effort.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata Transclusions

2012-06-14 Thread Gregor Hagedorn
On 14 June 2012 12:33, Gerard Meijssen gerard.meijs...@gmail.com wrote:
 Finally, when Wikidata provides data and info boxes, it does not mean that
 any project is compelled to use it. As Wikidata matures, it will become
 increasingly clear that it is not the best practice.

that may be, but Wikidata needs a path to get there. I think the
ability to integrate wikidata into existing the infobox consensus of a
Wikipedia community is essential for adoption. Over time, centrally
provided infoboxes with ever increasing customization functionality
(order, selection, arrangement, linking properties to Wikipedia pages
explaining them, etc.) are desirable and at some point the evolution
of wiki data may conclude that this become the preferred method.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata Transclusions

2012-06-14 Thread Gregor Hagedorn
 Gregor, I'm a bit confused -- are you talking about the transclusion design
 approach in this statement?

Yes, in the sense that it demands to be the only access to wiki data
content in a Wikipedia.

 because, if so, I'd think there'd be a number of
 infobox styles that can be selected by an author on the wikidata platform
 when 'building' the infobox page. The author can transclude any number/any
 specific infobox(es) on their wikipedia page, eg

 {{wikidata:en:Infobox:Some topic/some custom imfobox}}

As I say, I look forward to see an infobox builder being developed,
but this is a serious challenge.

See, e.g. http://en.wikipedia.org/wiki/Tiger and take a look at the
hierarchical arrangement of properties, formatting of them, linking of
them (Headings link to concept explanations on the same-language
Wikipedia, with the link being different than the display text, Early
Pleistocene – Recent may be a time range, but the value is Early
Pleistocene and the link is Pleistocene; similarly each taxonomic
author - here only ony present, Linnaeus - should link on en.wikipedia
to en.wikipedia and de.wikipedia to de.wikipedia), expressing some
information with graphics, see Endangered (IUCN 3.1)[1], properties or
property values containing footnotes, the fact that a subspecies is
extinct being abbreviated with a symbol (†P. t. virgata) etc. Note
that the latter case is actually a nesting: it is a list of
subspecies, with each subspecies having multiple properties like
Scientific Name, Wikipedia Page name, extinction status - I am not
sure Wikidata plans to model such data in Phase 2 already.

My bottomline: Keep the wikidata project manageable and doable with
the available resources. Offer a method for Wikipedians to pick up
Wikidata content within the existing template infrastrukture.

But, desirable: ask a white paper which additional work would be
required to create centralized, plain vanilla infobox rendering as
well.

Would you be willing to create such a whitepaper? How much of the
above-shown Tiger example can be created centrally with a limited set
of facilities? How feature rich must the customization become?

Or are you proposing to simply use the existing template programming
with the only the difference that wikidata is the only mediawiki where
the properties can be accessed within templates? Much of my argument
assumes that you are looking for a non-template based infobox
renderer, I may be wrong there.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [wikidata-intern] Re: Request for comments: syntax for including data on client wikis (aka how to make infoboxes)

2012-05-24 Thread Gregor Hagedorn
 This seems to be everyone's preference, even though it feels kind of icky to 
 me.
 Oh, well :) I'll rework the draft on that basis soon.

I look forward to it. Maybe it runs against some wall, but then we
have a better basis for comparison.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [wikidata-intern] Re: Request for comments: syntax for including data on client wikis (aka how to make infoboxes)

2012-05-23 Thread Gregor Hagedorn
On 23 May 2012 13:19, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
 On 23.05.2012 13:14, Nikola Smolenski wrote:
 If we assume that in practice #data-template is usually going to be wrapped 
 into
 a template, what's the point of having it at all? Do you see any technical
 reasons for it?

 How else do you pass a complex object to a template and make its properties 
 show
 up as template parameters?

I think I might have adressed that in my comment on the wiki. See
there, but essentially I believe it is technically equally valid, and
from a usability and community adoption standpoint far preferable, to
simply support a syntax to adress properties of the complex object,
and have the resolver of this syntax automatically pull the entire
complex wikidata object (of which the property is a part) into a
cache, so that subsequent calls to properties are returned from the
cached object.

I look forward to have this analyzed by Daniel. Obviously there are
some extra things that need to be added, but also other things simply
go away painlessly... Can you write a advantage/disadvantage
comparison on the wiki, Daniel, to be commented upon?

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Request for comments: syntax for including data on client wikis (aka how to make infoboxes)

2012-05-22 Thread Gregor Hagedorn
I added some comments on the wiki

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink

2012-04-04 Thread Gregor Hagedorn
 Wikidata can (and probably will) store information about each moon of
 Uranus, e.g., its mass. It does probably not make sense to store the mass of
 Moons of Uranus if there is such an article. It does not help to know that
 the article Moons on Uranus also talks (among other things) about some
 moon that has a particular mass: you need to know what *exactly* you are
 talking about to exploit this data. An article on Moons of Uranus could
 still (eventually) embed Wikidata data to improve its display, but this data
 must refer to individual moons, not to the article as a whole.

The problem I see is that you have no definition to which real object
the data are tied. We agree that the problem is not the interwiki
links per se. It is what results from it. How do we tie data to a
wikidata page when we don't know what it is about?

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] SNAK - assertion?

2012-04-04 Thread Gregor Hagedorn
Would the Word assertion be a possible replacement for the neonym Snak?

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink

2012-04-01 Thread Gregor Hagedorn
On 1 April 2012 13:04, Markus Krötzsch markus.kroetz...@cs.ox.ac.uk wrote:
 This is a valid point. It is intended to address this as follows:
 * Wikidata items (our content pages) will be in *exact* correspondence to
 (zero or more) Wikipedia articles in different languages.
 * Differences in scope will lead to different Wikidata items.
 * Relationships such as broader or narrower can be expressed as
 relations between these items, if desired.

This is a technically valid solution. Socially, I fear it would lead
to endless uncertainty which mechanism to use. Few abstract entities
will have exactly the same delimitation/width, but where should one
switch from one method of linking (one wikidata page with several more
less closely matching wikipedia pages) to the other (several wikidata
pages, one for each wikipedia page in each language)?

Also, importing data will be a nightmare, because the concepts used in
imported data will have to be compared with all wikipedias. One
Wikipedia-language-version has the post-WWII extent of Russia as well
as the current and another Wikipedia-language-version has them
separated. It may not have mattered before and only one Wikidata page
links to both language-versions. However at some point historical data
are imported and suddently Wikidata needs to be reorganized to have
two pages. ... Just thinking loud - this may be unavoidable perhaps...

However, my gut feeling is that if you plan to avoid relations between
Wikidata and Wikipedia, it might be a more comprehensible model to
then always using only one method, i.e. have a 0 to 1 or 1 to 1
relation between Wikidata page and Wikipedia page only, and express
everything else in Wikidata to Wikidata page relations. These
relations are then easily traceable and updateable, just as the
broadness or narrowness of a page in a given Wikipedia develops over
time.

 In general, Wikidata will not be able to replace all interwiki links: it
 will remain possible to define additional links in each Wikipedia to cover
 cases where the relationship between articles is not exact.

This worries me. It means that there will be forever conflicting
systems of editing interwiki links. If everything can be achieved with
Wikipedia, but only a subset with Wikidata, it spells social adoption
danger.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Data_model: Metamodel: Statement

2012-03-31 Thread Gregor Hagedorn
Some initial ideas on the statement. I realize that this is not
priority in the first phase, but perhaps on the wiki a place could be
created to collect some thoughts like those below?

http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model#The_Metamodel explains:

Statement = (StatementID, Property, Value, Qualifier*, Reference*)
The StatementID is a unique identifier for the given statement and
used only internally and for export.
A Property is defined on a property page. This definition includes a type.
The structure of the Value is given by the type of the property.
Simple types could demand an EntityID or a number. More complex types
could demand dates, date ranges, numbers with units, a geocoordinate,
a geo shape, etc.



1. Will Property include information on
observation/recording/measurement methodology?

2. What happens if the main entity has variants or parts for which
values are recorded?

An example is car models, where typically the revisions sold under the
same name are subsumed in one Wikipedia article. Example:

http://en.wikipedia.org/wiki/Renault_Kangoo
with different length / weight in the subclass info boxes for
first/second generation.

Cars also easily serve as an example for variable parts, see in
http://en.wikipedia.org/wiki/%C5%A0koda_Roomster the list of engine
specifications. The engines are not Wikipedia entities in their own
right.

3. Values may be well defined RDF-resources, but now available in Wikipedia.

In my work, many statements I would like to express in a future
Wikidata are not allowed as Wikipedia articles at all. You can express
Wowereit is_mayor_of Berlin, Germany
but not
Plantago lanceolata has_leaf_shape lanceolate
because
http://en.wikipedia.org/wiki/Lanceolate
is a redirect to
http://en.wikipedia.org/wiki/Leaf_shape

I personally would love to have illustrated definitions of things
people want to learn about being allowed on Wikipedia, but the
argument is generally that Wikipedia is not Wiktionary.

I believe Wikidata should right from the start be defined to allow
references to Wiktionary as well as Wikipedia. And while we are at it,
references to Commons as well (semantic image annotation...)

This would change
Wikipedialink = (Title, LanguageId, Badge?)
to
Link = (Project, LanguageId, Title, Badge?)

---

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l