from:"John Erling Blad"

[Wikidata] Re: units

2023-01-24 Thread John Erling Blad

Sparql is like the secret language of the Necromongers, it is completely
incomprehensible to the uninitiated that hasn't been through the Gates of
the Underworld.

It is perhaps the single most difficult thing to grasp for users of
Wikidata.

ons. 25. jan. 2023, 00:32 skrev Marco Neumann :

> Enjoy
>
> Best,
> Marco
>
> On Tue, Jan 24, 2023 at 11:30 PM Olaf Simons <
> olaf.sim...@pierre-marteau.com> wrote:
>
>> ...alles was ich machte, war mal wieder 30 mal komplizierter,
>>
>> vielen Dank!
>> Olaf
>>
>>
>> > Marco Neumann  hat am 24.01.2023 23:48 CET
>> geschrieben:
>> >
>> >
>> > https://tinyurl.com/2nbqnavq
>> > ___
>> > Wikidata mailing list -- wikidata@lists.wikimedia.org
>> > Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/NW3SGOW6LHX3XAYMMWJZ72LDAWSJ73MU/
>> > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>> Dr. Olaf Simons
>> Forschungszentrum Gotha der Universität Erfurt
>> Am Schlossberg 2
>> 99867 Gotha
>> Büro: +49-361-737-1722
>> Mobil: +49-179-5196880
>> Privat: Hauptmarkt 17b/ 99867 Gotha
>> ___
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/KXGWRI3N3FYLTLSOGJ3FJ2JVKWVOHV52/
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>
>
> --
>
>
> ---
> Marco Neumann
>
>
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/BSJQRJOJBG3ON747K2GNMHFSEPILOXBH/
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/3EQ3E3FZSZYCYVEPJFVAGRQHQSGIIOYB/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata] Re: History of some original Wikidata design decisions?

2021-07-24 Thread John Erling Blad

Just to clarify, “Wikidata The Movie” was (once upon a time) a standing
joke at the original team, with wild guesses on who would play the
different characters.

But now, off to thinking Deep Thoughts.

On Sat, Jul 24, 2021 at 11:54 PM John Erling Blad  wrote:

> > A Wikidata book would be most excellent,
>
> So, what about “Wikidata – The Movie”? Who will cast Denny? Would it be
> Anthony Hopkins?
>
> John will now go to bed! (I'm not here, etc…)
>
> On Fri, Jul 23, 2021 at 8:10 PM Ed Summers  wrote:
>
>>
>> > On Thu, Jul 22, 2021 at 6:56 PM Denny Vrandečić <
>> > dvrande...@wikimedia.org> wrote:
>> > >
>> > > I hope that helps with the historical deep dive :) Lydia and I
>> > > really should write that book!
>>
>> A Wikidata book would be most excellent, especially one by both of you!
>> If there's anything interested people can do to help make it happen (a
>> little crowdfunding or what have you) please let us know.
>>
>> //Ed
>> ___
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata] Re: History of some original Wikidata design decisions?

2021-07-24 Thread John Erling Blad

> A Wikidata book would be most excellent,

So, what about “Wikidata – The Movie”? Who will cast Denny? Would it be
Anthony Hopkins?

John will now go to bed! (I'm not here, etc…)

On Fri, Jul 23, 2021 at 8:10 PM Ed Summers  wrote:

>
> > On Thu, Jul 22, 2021 at 6:56 PM Denny Vrandečić <
> > dvrande...@wikimedia.org> wrote:
> > >
> > > I hope that helps with the historical deep dive :) Lydia and I
> > > really should write that book!
>
> A Wikidata book would be most excellent, especially one by both of you!
> If there's anything interested people can do to help make it happen (a
> little crowdfunding or what have you) please let us know.
>
> //Ed
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Re: [Wikidata] Big numbers

2019-10-07 Thread John Erling Blad

In the DecimalMath class you escalate scaling on length of fractional part
during multiply. That is what they do in school, but it leads to false
precision. It can be argued that this is both wrong and right. Is there any
particular reason (use case) why you do that? I don't escalate scaling in
the Lua lib.

I see you have bumped into the problem whether precision is a last
digit resolution or a count of significant digits. There are a couple of
(several!) different definitions, and I'm not sure which one is right. 3 ±
0.5 meter is comparable to 120 ±10 inches, but interpretation of "3 meter"
as having a default precision of ±0.5 meter is problematic. It is easier to
see the problem if you compare with a prefix. What is the precision of 3000
meter vs 3 km? And when do you count significant digits? Is zero (0) a
significant digit?

Otherwise I find this extremely amusing. When I first mentioned this we got
into a fierce discussion, and the conclusion was that we should definitely
not use big numbers. Now we do. :D

On Mon, Oct 7, 2019 at 10:44 AM Daniel Kinzler 
wrote:

> Am 07.10.19 um 09:50 schrieb John Erling Blad:
> > Found a few references to bcmath, but some weirdness made me wonder if
> it really
> > was bcmath after all. I wonder if the weirdness is the juggling with
> double when
> > bcmath is missing.
>
> I haven't looked at the code in five years or so, but when I wrote it,
> Number
> was indeed bcmath with fallback to float. The limit of 127 characters
> sounds
> right, though I'm not sure without looking at the code.
>
> Quantity is based on Number, with quite a bit of added complexity for
> converting
> between units while considering the value's precision. e.g. "3 meters"
> should
> not turn into "118,11 inch", but "118 inch" or even "120 inch", if it's the
> default +/- 0.5 meter = 19,685 inch, which means the last digit is
> insignificant. Had lots of fun and confusion with that. I also implemented
> rounding on decimal strings for that. And initially screwed up some edge
> cases,
> which I only realized when helping my daughter with her homework ;)
>
> --
> Daniel Kinzler
> Principal Software Engineer, Core Platform
> Wikimedia Foundation
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Big numbers

2019-10-07 Thread John Erling Blad

Found a few references to bcmath, but some weirdness made me wonder if it
really was bcmath after all. I wonder if the weirdness is the juggling with
double when bcmath is missing.



On Mon, Oct 7, 2019 at 9:18 AM Jeroen De Dauw 
wrote:

> Hey John,
>
> I'm not aware of any documentation, though there probably is some
> somewhere. What I can point you to is the code dealing with numbers:
> https://github.com/wmde/Number
>
> Cheers
>
> --
> Jeroen De Dauw | www.EntropyWins.wtf  |
> www.Professional.Wiki 
> Entrepreneur | Software Crafter | Speaker | Open Souce and Wikimedia
> contributor
> ~=[,,_,,]:3
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Big numbers

2019-10-07 Thread John Erling Blad

I'm not sure, I have not tried to figure out why, but some places it seems
like the limit hits around 95 chars.

On Mon, Oct 7, 2019 at 9:17 AM Nicolas VIGNERON 
wrote:

> Hi,
>
> The offical documentation is
> https://www.wikidata.org/wiki/Help:Data_type#quantity but indeed there is
> no indication of the limit.
>
> From what I tested, apparently you can enter any number which is less than
> 127 characters long (https://www.wikidata.org/wiki/Q4115189#P1104).
>
> Cheers,
> ~nicolas
>
> Le lun. 7 oct. 2019 à 08:58, John Erling Blad  a écrit :
>
>> Is there any documentation of the number format used by the quantity
>> type? Bumped into this and had to implement the BCmath extension to handle
>> the number. The reason why I did it (except it was fun) is to handle some
>> weird unit conversions. By inspection I found there were numbers at
>> Wikidata that clearly could not be implemented as doubles, and testing a
>> little I found that this had to be implemented as some kind of big numbers.
>> Lua does not have big numbers, and using the numbers from quantity as a
>> plain number type is a coming disaster.
>>
>> So, is there any documentation for the quantity format anywhere? I have
>> not found anything. I posted a task about it, and please add info there if
>> you know where some info can be found. I suspect the format just happen to
>> be the same as BC, and nobody really checked if the format was compatible,
>> or…?
>>
>> The BCmath extension can be found at
>> - https://www.mediawiki.org/wiki/Extension:BCmath
>> - https://github.com/jeblad/BCmath
>>
>> There is a Vagrant role if anyone likes to test it out.
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Big numbers

2019-10-06 Thread John Erling Blad

Is there any documentation of the number format used by the quantity type?
Bumped into this and had to implement the BCmath extension to handle the
number. The reason why I did it (except it was fun) is to handle some weird
unit conversions. By inspection I found there were numbers at Wikidata that
clearly could not be implemented as doubles, and testing a little I found
that this had to be implemented as some kind of big numbers. Lua does not
have big numbers, and using the numbers from quantity as a plain number
type is a coming disaster.

So, is there any documentation for the quantity format anywhere? I have not
found anything. I posted a task about it, and please add info there if you
know where some info can be found. I suspect the format just happen to be
the same as BC, and nobody really checked if the format was compatible, or…?

The BCmath extension can be found at
- https://www.mediawiki.org/wiki/Extension:BCmath
- https://github.com/jeblad/BCmath

There is a Vagrant role if anyone likes to test it out.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-18 Thread John Erling Blad

Tried a couple of times to rewrite this, but it grows out of bound
anyhow. Seems like it has its own life.

There is a book from 2000 by Robert Dale and Ehud Reiter; Building
natural language generation systems  ISBN 978-0-521-02451-8

Wikibase items can be rebuilt as Plans from the type statement
(top-down) or as Constituents from the other statements (bottom-up).
The two models does not necessarily agree. This is although only the
overall document structure, and organizing of the data, and it leaves
out the really hard part – the language specific realization.

You can probably redefine Plans and Constituents as entities, I have
toyed around with them as Lua classes, and put them into Wikidata. The
easiest way to reuse them locally would be to use a lookup structure
for fully or partly canned text, and define rules for agreement and
inflection as part of these texts. Piecing together canned text is
hard, but easier than building full prose from the bottom. It is
possible to define a very low-level realization for some languages,
but that is a lot harder.

The idea for lookup of canned text is to use the text that covers most
of the available statements, but still such that most of the remaining
statements can also be covered. That is some kind of canned text might
not support a specific agreement rule, thus some other canned text can
not reference it and less coverage is achieved. For example the
direction to the sea can not be expressed in a canned text for Finnish
and then the distance can not reference the direction.

To get around this I prioritized Plans and Constituents, with those
having higher priority being put first. What a person is known for
should go in front of his other work. I ordered the Plans and
Constituents chronologically to maintain causality. This can also be
called sorting. Priority tend to influence plans, and order influence
constituents. Then there are grouping, which keeps some statements
together.  Length, width, height are typically a group.

A lake can be described with individual canned text for length, width,
and height, but those are given low priority. Then it an be made a
canned text for length and height, with somewhat higher priority. An
even higher priority can be given to a canned text for all three.
Given that all three statements are available then the composite
canned text for all of them will be used. If only some of them exist
then a lower priority canned text will be used.

Note that the book use "canned text" a little different.

Also note that the canned texts can be translated as ordinary message
strings. They can also be defined as a kind of entities in Wikidata.
As ordinary message strings they need additional data, but that comes
naturally as entities in Wikidata. My drodling put it inside each
Wikipedia, as it would be easier to reuse from Lua-modules. (And yes,
you can then override part of the ArticlePlaceholder to show the text
at the special page.)

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-14 Thread John Erling Blad

An additional note; what Wikipedia urgently needs is a way to create
and reuse canned text (aka "templates"), and a way to adapt that text
to data from Wikidata. That is mostly just inflection rules, but in
some cases it involves grammar rules. To create larger pieces of text
is much harder, especially if the text is supposed to be readable.
Jumbling sentences together as is commonly done by various botscripts
does not work very well, or rather, it does not work at all.

On Mon, Jan 14, 2019 at 11:44 AM John Erling Blad  wrote:
>
> Using an abstract language as an basis for translations have been
> tried before, and is almost as hard as translating between two common
> languages.
>
> There are two really hard problems, it is the implied references and
> the cultural context. An artificial language can get rid of the
> implied references, but it tend to create very weird and unnatural
> expressions. If the cultural context is removed, then it can be
> extremely hard to put it back in, and without any cultural context it
> can be hard to explain anything.
>
> But yes, you can make an abstract language, but it won't give you any
> high quality prose.
>
> On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone  wrote:
> >
> > This is quite an awesome idea. But thinking about it, wouldn't it be 
> > possible to use structured data in wikidata to generate articles? Can't we 
> > skip the need of learning an abstract language by using wikidata?
> >
> > Also, is there discussion about this idea anywhere in the Wikimedia wikis? 
> > I haven't found any...
> >
> > On Sat, Sep 29, 2018 at 3:44 PM Pine W  wrote:
> >>
> >> Forwarding because this (ambitious!) proposal may be of interest to people
> >> on other lists. I'm not endorsing the proposal at this time, but I'm
> >> curious about it.
> >>
> >> Pine
> >> ( https://meta.wikimedia.org/wiki/User:Pine )
> >>
> >>
> >> -- Forwarded message -
> >> From: Denny Vrandečić 
> >> Date: Sat, Sep 29, 2018 at 6:32 PM
> >> Subject: [Wikimedia-l] Wikipedia in an abstract language
> >> To: Wikimedia Mailing List 
> >>
> >>
> >> Semantic Web languages allow to express ontologies and knowledge bases in a
> >> way meant to be particularly amenable to the Web. Ontologies formalize the
> >> shared understanding of a domain. But the most expressive and widespread
> >> languages that we know of are human natural languages, and the largest
> >> knowledge base we have is the wealth of text written in human languages.
> >>
> >> We looks for a path to bridge the gap between knowledge representation
> >> languages such as OWL and human natural languages such as English. We
> >> propose a project to simultaneously expose that gap, allow to collaborate
> >> on closing it, make progress widely visible, and is highly attractive and
> >> valuable in its own right: a Wikipedia written in an abstract language to
> >> be rendered into any natural language on request. This would make current
> >> Wikipedia editors about 100x more productive, and increase the content of
> >> Wikipedia by 10x. For billions of users this will unlock knowledge they
> >> currently do not have access to.
> >>
> >> My first talk on this topic will be on October 10, 2018, 16:45-17:00, at
> >> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second,
> >> longer talk on the topic will be at the DL workshop in Tempe, AZ, October
> >> 27-29. Comments are very welcome as I prepare the slides and the talk.
> >>
> >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf
> >>
> >> Cheers,
> >> Denny
> >> ___
> >> Wikimedia-l mailing list, guidelines at:
> >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> >> https://meta.wikimedia.org/wiki/Wikimedia-l
> >> New messages to: wikimedi...@lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >> ___
> >> Wikipedia-l mailing list
> >> wikipedi...@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-14 Thread John Erling Blad

Using an abstract language as an basis for translations have been
tried before, and is almost as hard as translating between two common
languages.

There are two really hard problems, it is the implied references and
the cultural context. An artificial language can get rid of the
implied references, but it tend to create very weird and unnatural
expressions. If the cultural context is removed, then it can be
extremely hard to put it back in, and without any cultural context it
can be hard to explain anything.

But yes, you can make an abstract language, but it won't give you any
high quality prose.

On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone  wrote:
>
> This is quite an awesome idea. But thinking about it, wouldn't it be possible 
> to use structured data in wikidata to generate articles? Can't we skip the 
> need of learning an abstract language by using wikidata?
>
> Also, is there discussion about this idea anywhere in the Wikimedia wikis? I 
> haven't found any...
>
> On Sat, Sep 29, 2018 at 3:44 PM Pine W  wrote:
>>
>> Forwarding because this (ambitious!) proposal may be of interest to people
>> on other lists. I'm not endorsing the proposal at this time, but I'm
>> curious about it.
>>
>> Pine
>> ( https://meta.wikimedia.org/wiki/User:Pine )
>>
>>
>> -- Forwarded message -
>> From: Denny Vrandečić 
>> Date: Sat, Sep 29, 2018 at 6:32 PM
>> Subject: [Wikimedia-l] Wikipedia in an abstract language
>> To: Wikimedia Mailing List 
>>
>>
>> Semantic Web languages allow to express ontologies and knowledge bases in a
>> way meant to be particularly amenable to the Web. Ontologies formalize the
>> shared understanding of a domain. But the most expressive and widespread
>> languages that we know of are human natural languages, and the largest
>> knowledge base we have is the wealth of text written in human languages.
>>
>> We looks for a path to bridge the gap between knowledge representation
>> languages such as OWL and human natural languages such as English. We
>> propose a project to simultaneously expose that gap, allow to collaborate
>> on closing it, make progress widely visible, and is highly attractive and
>> valuable in its own right: a Wikipedia written in an abstract language to
>> be rendered into any natural language on request. This would make current
>> Wikipedia editors about 100x more productive, and increase the content of
>> Wikipedia by 10x. For billions of users this will unlock knowledge they
>> currently do not have access to.
>>
>> My first talk on this topic will be on October 10, 2018, 16:45-17:00, at
>> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second,
>> longer talk on the topic will be at the DL workshop in Tempe, AZ, October
>> 27-29. Comments are very welcome as I prepare the slides and the talk.
>>
>> Link to the paper: http://simia.net/download/abstractwikipedia.pdf
>>
>> Cheers,
>> Denny
>> ___
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> New messages to: wikimedi...@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> 
>> ___
>> Wikipedia-l mailing list
>> wikipedi...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Imperative programming in Lua, do we really want it?

2017-12-07 Thread John Erling Blad

There are some really weird modules out there, I'm not sure whether it
makes a good discussion environment to point them out.

My wild guess is that the modules turn into an imperative style because the
libraries (including Wikibase), returns fragments of large table
structures. To process the fragments you then do iterations where different
subparts of those tables are extracted, keeping a state for whatever you
infer from those calls. This creates a lot of extracted states, and often
the states exists inside calls you can't test. Usually you can't even get
to those calls without breaking the interface *somehow*.

Perhaps something can be done by writing a few example pages on Mediawiki,
but my experience is that developers at Wikipedia (aka script kiddies like
me) does not check pages at Mediawiki, they just assume they do it
TheRightWay™. Writhing a set of programming manuals can thus easily become
a complete waste of time.

No, I don't have any easy solutions.

On Wed, Dec 6, 2017 at 11:53 PM, Jeroen De Dauw 
wrote:

> Hey,
>
> While I am not up to speed with the Lua surrounding Wikidata or MediaWiki,
> I support the call for avoiding overly imperative code where possible.
>
> Most Lua code I have seen in the past (which has nothing to do with
> MediaWiki) was very imperative, procedural and statefull. Those are things
> you want to avoid if you want your code to be maintainable, easy to
> understand and testable. Since Lua supports OO and functional styles, the
> language is not an excuse for throwing well establishes software
> development practices out of the window.
>
> If the code is currently procedural, I would recommend establishing that
> new code should not be procedural and have automawted tests unless there is
> very good reason to make an exception. If some of this code is written by
> people not familiar with software development, it is also important to
> create good examples for them and provide guidance so they do not
> unknowingly copy and adopt poor practices/styles.
>
> John, perhaps you can link the code that caused you to start this thread
> so that there is something more concrete to discuss?
>
> (This is just my personal opinion, not some official statement from
> Wikimedia Deutschland)
>
> PS: I just noticed this is the Wikidata mailing list and not the
> Wikidata-tech one :(
>
> Cheers
>
> --
> Jeroen De Dauw | https://entropywins.wtf | https://keybase.io/jeroendedauw
> Software craftsmanship advocate | Developer at Wikimedia Germany
> ~=[,,_,,]:3
>
> On 6 December 2017 at 23:31, John Erling Blad  wrote:
>
>> With the current Lua environment we have ended up with an imperative
>> programming style in the modules. That invites to statefull objects, which
>> does not create easilly testable libraries.
>>
>> Do we have some ideas on how to avoid this, or is it simply the way
>> things are in Lua? I would really like functional programming with
>> chainable calls, but other might want something different?
>>
>> John
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Imperative programming in Lua, do we really want it?

2017-12-06 Thread John Erling Blad

With the current Lua environment we have ended up with an imperative
programming style in the modules. That invites to statefull objects, which
does not create easilly testable libraries.

Do we have some ideas on how to avoid this, or is it simply the way things
are in Lua? I would really like functional programming with chainable
calls, but other might want something different?

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad

efore all other arguments is a reckless way of
> maximising effect, and such rhetoric can damage our movement beyond this
> thread or topic. Our main strength is not our content but our community,
> and I am glad to see that many have already responded to you in such a
> measured and polite way.
>
> Peace,
>
> Markus
>
>
> On 30.11.2017 09:55, John Erling Blad wrote:
> > Licensing was discussed in the start of the project, as in start of
> > developing code for the project, and as I recall it the arguments for
> > CC0 was valid and sound. That was long before Danny started working for
> > Google.
> >
> > As I recall it was mention during first week of the project (first week
> > of april), and the duscussion reemerged during first week of
> > development. That must have been week 4 or 5 (first week of may), as the
> > delivery of the laptoppen was delayed. I was against CC0 as I expected
> > problems with reuse og external data. The arguments for CC0 convinced me.
> >
> > And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and
> > Jens did too.
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad

This was added to the wrong email, sorry for that.

On Thu, Nov 30, 2017 at 11:45 AM, Luca Martinelli 
wrote:

> Il 30 nov 2017 09:55, "John Erling Blad"  ha scritto:
>
> Please keep this civil and on topic!
>
> I was just pointing out that CC0 wasn't forced down our throat by Silicon
> Valley's Fifth Column supposed embodiment, that we actually discussed
> several alternatives (ODbL included, which I saw was mentioned in the
> original message of this thread) and that that several of the objections
> made here were actually founded, as several other discussions happened
> outside this ML confirmed.
>
> I'm sorry if it appeared I wanted to start a brawl, it wasn't the case.
> For this misunderstanding, I'm sorry.
>
> L.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad

Sorry for the sprelling errojs, my post was written on a cellphone set to
Norwegian.

On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad  wrote:

> Please keep this civil and on topic!
>
> Licensing was discussed in the start of the project, as in start of
> developing code for the project, and as I recall it the arguments for CC0
> was valid and sound. That was long before Danny started working for Google.
>
> As I recall it was mention during first week of the project (first week of
> april), and the duscussion reemerged during first week of development. That
> must have been week 4 or 5 (first week of may), as the delivery of the
> laptoppen was delayed. I was against CC0 as I expected problems with reuse
> og external data. The arguments for CC0 convinced me.
>
> And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
> did too.
>
> Argument is pretty simple: Part A has some data A and claim license A.
> Part B has some data B and claim license B. Both license A and  license B
> are sticky, this later data C that use an aggregation of A and B must
> satisfy both license A and license B. That is not viable.
>
> Moving forward to a safe, non-sticky license seems to be the only viable
> solution, and this leads to CC0.
>
> Feel free to discuss the merrit of our choice but do not use personal
> attacs. Thank you.
>
> Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
> martinellil...@gmail.com>:
>
>> Oh, and by the way, ODbL was considered as a potential license, but I
>> recall that that license could have been incompatible for reuse with CC
>> BY-SA 3.0. It was actually a point of discussion with the Italian
>> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
>> meeting the possibility of a collaboration between WD and OSM.
>>
>> L.
>>
>> Il 30 nov 2017 08:57, "Luca Martinelli"  ha
>> scritto:
>>
>>> I basically stopped reading this email after the first attack to Denny.
>>>
>>> I was there since the beginning, and I do recall the *extensive*
>>> discussion about what license to use. CC0 was chosen, among other things,
>>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>>> didn't allow us to counter - please remember that 4.0 were still under
>>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>>> out before publishing Wikidata.
>>>
>>> And possibly next time provide a TL;DR version of your email at the top.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>>
>>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>>> psychosl...@culture-libre.org> ha scritto:
>>>
>>>> Saluton ĉiuj,
>>>>
>>>> I forward here the message I initially posted on the Meta Tremendous
>>>> Wiktionary User Group talk page
>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
>>>> because I'm interested to have a wider feedback of the community on this
>>>> point. Whether you think that my view is completely misguided or that I
>>>> might have a few relevant points, I'm extremely interested to know it, so
>>>> please be bold.
>>>>
>>>> Before you consider digging further in this reading, keep in mind that
>>>> I stay convinced that Wikidata is a wonderful project and I wish it a
>>>> bright future full of even more amazing things than what it already brung
>>>> so far. My sole concern is really a license issue.
>>>>
>>>> Bellow is a copy/paste of the above linked message:
>>>>
>>>> Thank you Lydia Pintscher
>>>> <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for
>>>> taking the time to answer. Unfortunately this answer
>>>> <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
>>>> miss too many important points to solve all concerns which have been 
>>>> raised.
>>>>
>>>> Notably, there is still no beginning of hint in it about where the
>>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>>> inquiry on the topic
>>>> <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive>
>>>> advance, an answer is emerging from it. It se

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad

A single property licensing scheme would allow storage of data, it might or
might not allow reuse of the licensed data together with other data.
Remember that all entries in the servers might be part of an mashup with
all other entries.

On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad  wrote:

> Please keep this civil and on topic!
>
> Licensing was discussed in the start of the project, as in start of
> developing code for the project, and as I recall it the arguments for CC0
> was valid and sound. That was long before Danny started working for Google.
>
> As I recall it was mention during first week of the project (first week of
> april), and the duscussion reemerged during first week of development. That
> must have been week 4 or 5 (first week of may), as the delivery of the
> laptoppen was delayed. I was against CC0 as I expected problems with reuse
> og external data. The arguments for CC0 convinced me.
>
> And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
> did too.
>
> Argument is pretty simple: Part A has some data A and claim license A.
> Part B has some data B and claim license B. Both license A and  license B
> are sticky, this later data C that use an aggregation of A and B must
> satisfy both license A and license B. That is not viable.
>
> Moving forward to a safe, non-sticky license seems to be the only viable
> solution, and this leads to CC0.
>
> Feel free to discuss the merrit of our choice but do not use personal
> attacs. Thank you.
>
> Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
> martinellil...@gmail.com>:
>
>> Oh, and by the way, ODbL was considered as a potential license, but I
>> recall that that license could have been incompatible for reuse with CC
>> BY-SA 3.0. It was actually a point of discussion with the Italian
>> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
>> meeting the possibility of a collaboration between WD and OSM.
>>
>> L.
>>
>> Il 30 nov 2017 08:57, "Luca Martinelli"  ha
>> scritto:
>>
>>> I basically stopped reading this email after the first attack to Denny.
>>>
>>> I was there since the beginning, and I do recall the *extensive*
>>> discussion about what license to use. CC0 was chosen, among other things,
>>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>>> didn't allow us to counter - please remember that 4.0 were still under
>>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>>> out before publishing Wikidata.
>>>
>>> And possibly next time provide a TL;DR version of your email at the top.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>>
>>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>>> psychosl...@culture-libre.org> ha scritto:
>>>
>>>> Saluton ĉiuj,
>>>>
>>>> I forward here the message I initially posted on the Meta Tremendous
>>>> Wiktionary User Group talk page
>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
>>>> because I'm interested to have a wider feedback of the community on this
>>>> point. Whether you think that my view is completely misguided or that I
>>>> might have a few relevant points, I'm extremely interested to know it, so
>>>> please be bold.
>>>>
>>>> Before you consider digging further in this reading, keep in mind that
>>>> I stay convinced that Wikidata is a wonderful project and I wish it a
>>>> bright future full of even more amazing things than what it already brung
>>>> so far. My sole concern is really a license issue.
>>>>
>>>> Bellow is a copy/paste of the above linked message:
>>>>
>>>> Thank you Lydia Pintscher
>>>> <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for
>>>> taking the time to answer. Unfortunately this answer
>>>> <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
>>>> miss too many important points to solve all concerns which have been 
>>>> raised.
>>>>
>>>> Notably, there is still no beginning of hint in it about where the
>>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>>> inquiry on the topic
>>>> <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad

Please keep this civil and on topic!

Licensing was discussed in the start of the project, as in start of
developing code for the project, and as I recall it the arguments for CC0
was valid and sound. That was long before Danny started working for Google.

As I recall it was mention during first week of the project (first week of
april), and the duscussion reemerged during first week of development. That
must have been week 4 or 5 (first week of may), as the delivery of the
laptoppen was delayed. I was against CC0 as I expected problems with reuse
og external data. The arguments for CC0 convinced me.

And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
did too.

Argument is pretty simple: Part A has some data A and claim license A. Part
B has some data B and claim license B. Both license A and  license B are
sticky, this later data C that use an aggregation of A and B must satisfy
both license A and license B. That is not viable.

Moving forward to a safe, non-sticky license seems to be the only viable
solution, and this leads to CC0.

Feel free to discuss the merrit of our choice but do not use personal
attacs. Thank you.

Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
martinellil...@gmail.com>:

> Oh, and by the way, ODbL was considered as a potential license, but I
> recall that that license could have been incompatible for reuse with CC
> BY-SA 3.0. It was actually a point of discussion with the Italian
> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
> meeting the possibility of a collaboration between WD and OSM.
>
> L.
>
> Il 30 nov 2017 08:57, "Luca Martinelli"  ha
> scritto:
>
>> I basically stopped reading this email after the first attack to Denny.
>>
>> I was there since the beginning, and I do recall the *extensive*
>> discussion about what license to use. CC0 was chosen, among other things,
>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>> didn't allow us to counter - please remember that 4.0 were still under
>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>> out before publishing Wikidata.
>>
>> And possibly next time provide a TL;DR version of your email at the top.
>>
>> Cheers,
>>
>> L.
>>
>>
>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>> psychosl...@culture-libre.org> ha scritto:
>>
>>> Saluton ĉiuj,
>>>
>>> I forward here the message I initially posted on the Meta Tremendous
>>> Wiktionary User Group talk page
>>> ,
>>> because I'm interested to have a wider feedback of the community on this
>>> point. Whether you think that my view is completely misguided or that I
>>> might have a few relevant points, I'm extremely interested to know it, so
>>> please be bold.
>>>
>>> Before you consider digging further in this reading, keep in mind that I
>>> stay convinced that Wikidata is a wonderful project and I wish it a bright
>>> future full of even more amazing things than what it already brung so far.
>>> My sole concern is really a license issue.
>>>
>>> Bellow is a copy/paste of the above linked message:
>>>
>>> Thank you Lydia Pintscher
>>>  for
>>> taking the time to answer. Unfortunately this answer
>>> 
>>> miss too many important points to solve all concerns which have been raised.
>>>
>>> Notably, there is still no beginning of hint in it about where the
>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>> inquiry on the topic
>>> 
>>> advance, an answer is emerging from it. It seems that Wikidata choice
>>> toward CC0 was heavily influenced by Denny Vrandečić, who – to make it
>>> short – is now working in the Google Knowledge Graph team. Also it worth
>>> noting that Google funded a quarter of the initial development work.
>>> Another quarter came from the Gordon and Betty Moore Foundation,
>>> established by Intel co-founder. And half the money came from Microsoft
>>> co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1]
>>> .
>>> To state it shortly in a conspirational fashion, Wikidata is the puppet
>>> trojan horse of big tech hegemonic companies into the realm of Wikimedia.
>>> For a less tragic, more argumentative version, please see the research
>>> project (work in progress, only chapter 1 is in good enough shape, and it's
>>> only available in French so far). Some proofs that this claim is completely
>>> wrong are welcome, as it would be great that

[Wikidata] Renaming of labels, copy to alias

2017-11-27 Thread John Erling Blad

Would it be possible to make an implicit copy of a label to an alias before
the label is changed? It would then be possible to keep on using a label as
an identifier in Lua code long as it doesn't conflicts with other
identifiers within the same item, thus lowering the maintenance load. More
important, it would lessen the number of violations during a name change.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

2017-11-24 Thread John Erling Blad

Implicit heterogeneous unordered containers where members sees a
homogeneous parent. The member properties should be transitive to avoid the
maintenance burden, like a "tracking property", and also to make the parent
item manageable.

I can't see anything that needs any kind of special structure at the entity
level. Not even sure whether we need a new container for this, claims are
already unordered containers.

On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett 
wrote:

> On 24 November 2017 at 23:30, Dario Taraborelli
>  wrote:
>
> > I'd like to propose a fairly simple solution and hear your feedback on
> > whether it makes sense to implement it as is or with some modifications.
> >
> > create a Wikidata class called "Wikidata item collection" [Q-X]
>
> This sounds like Wikimedia categories, as used on Wikipedia and
> Wikimedia Commons.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Next IRC office hour on November 14th (today!)

2017-11-15 Thread John Erling Blad

Not sure why, but I usually get these emails several days after the meeting
has been done. Lucas notice was in yesterday, the 14th, and describes a
meeting that is about to happen the 11th.  The final note in the email "See
you there in 40 minutes" is nice, but my time travel device is broken.

On Tue, Nov 14, 2017 at 8:18 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Tue, Nov 14, 2017 at 6:18 PM, Lucas Werkmeister
>  wrote:
> > Hello,
> >
> > our next Wikidata IRC office hour will take place on November 11th, 18:00
> > UTC (19:00 in Berlin), on the channel #wikimedia-office (connect).
> >
> > During one hour, you’ll be able to chat with the development team about
> the
> > past, current and future projects, and ask any question you want.
> >
> > (Sorry for the short notice! It looks like we forgot to send this email
> > earlier.)
> >
> > See you there in 40 minutes!
>
> For everyone who couldn't make it here is the log:
> https://tools.wmflabs.org/meetbot/wikimedia-office/2017/
> wikimedia-office.2017-11-14-18.00.log.html
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikimedian in Residence position at University of Virginia

2017-11-10 Thread John Erling Blad

They have some darn interesting research projects!
No, I can't apply… :(

On Fri, Nov 10, 2017 at 4:40 AM, Daniel Mietchen <
daniel.mietc...@googlemail.com> wrote:

> Dear all,
>
> I'm happy to announce that a one-year position for a Wikimedian in
> Residence is open in Charlottesville at the Data Science Institute
> (DSI) at the University of Virginia (UVA).
>
> It is aimed at fostering the interaction between the university -
> students, researchers, librarians, research administrators and others
> - and the Wikimedia communities and platforms. As such, the project
> will work across Wikimedia projects and UVA subdivisions, and
> experience in such contexts will be valued.
>
> More details about the position via
> - https://careers.insidehighered.com/job/1471604/wikimedian-in-residence/
> - http://www.my.jobs/charlottesville-va/wikimedian-in-residence/
> 29f03442637b4cc3846be0d033afb665/job/
> .
>
> For more details about the institute, see
> http://dsi.virginia.edu/ .
>
> I am working for the DSI (as a researcher) and shall be happy to
> address any questions or suggestions on the matter (including
> collaboration with other Wikimedian in Residence projects), preferably
> on-wiki or via my work email (in CC).
>
> Please feel free to pass this on to your networks.
>
> Thanks and cheers,
>
> Daniel
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Coordinate precision in Wikidata, RDF & query service

2017-11-05 Thread John Erling Blad

Not sure if I would go for it, but…

"Precision for the location of the center should be one percent of the
square root of the area covered."

Oslo covers nearly 1000 km², that would give 1 % of 32 km or 300 meter or
0.3 arc seconds.

On Mon, Nov 6, 2017 at 2:50 AM, John Erling Blad  wrote:

> Glittertinden, a mountain in Norway have a geopos 61.651222 N 8.557492 E,
> alternate geopos 6835406.62, 476558.22 (EU89, UTM32).
>
> Some of the mountains are measured to within a millimeter in elevation.
> For example Ørneflag is measured to be at 1242.808 meter, with a position
> 6705530.826, 537607.272 (EU89, UTM32) alternate geopos 6717133.02, 208055.24
> (EU89, UTM33). This is on a bolt on the top of the mountain.
> There is an on-going project to map the country withing 1x1 meter and
> elevation about 0.2 meter.
>
> One arc second is about 1 km, so five digits after decimal point should be
> about 1 cm lateral precision.
>
> Goepositions isn't a fixed thing, there can be quite large tidal waves,
> and modelling and estimating them is an important research field. The waves
> can be as large as ~0.3 meter. (From long ago, ask someone working on
> this.) Estimation of where we are is to less than 1 cm, but I have heard
> better numbers.
>
> All geopos hould have a reference datum, without it it is pretty useless
> when the precision is high. An easy fix could be to use standard profiles
> with easy to recognize names, like "GPS", and limit the precision in that
> case to two digits after decimal point on an arc second.
>
> Note that precision in longitude will depend on actual latitude.
>
> On Fri, Sep 1, 2017 at 9:43 PM, Peter F. Patel-Schneider <
> pfpschnei...@gmail.com> wrote:
>
>> The GPS unit on my boat regularly claims an estimated position error of 4
>> feet after it has acquired its full complement of satellites.  This is a
>> fairly new mid-price GPS unit using up to nine satellites and WAAS.  So my
>> recreational GPS supposedly obtains fifth-decimal-place accuracy.  It was
>> running under an unobstructed sky, which is common when boating.  Careful
>> use of a good GPS unit should be able to achieve this level of accuracy on
>> land as well.
>>
>> From http://www.gps.gov/systems/gps/performance/accuracy/ the raw
>> accuracy
>> of the positioning information from a satellite is less than 2.4 feet 95%
>> of
>> the time.  The accuracy reported by a GPS unit is degraded by atmospheric
>> conditions; false signals, e.g., bounces; and the need to determine
>> position
>> by intersecting the raw data from several satellites.  Accuracy can be
>> improved by using more satellites and multiple frequencies and by
>> comparing to a signal from a receiver at a known location.
>>
>> The web page above claims that accuracy can be improved to a few
>> centimeters
>> in real time and down to the millimeter level if a device is left in the
>> same place for a long period of time.  I think that these last two
>> accuracies require a close-by receiver at a known location and correspond
>> to what is said in [4].
>>
>> peter
>>
>>
>>
>> On 08/30/2017 06:53 PM, Nick Wilson (Quiddity) wrote:
>> > On Tue, Aug 29, 2017 at 2:13 PM, Stas Malyshev 
>> wrote:
>> >> [...] Would four decimals
>> >> after the dot be enough? According to [4] this is what commercial GPS
>> >> device can provide. If not, why and which accuracy would be
>> appropriate?
>> >>
>> >
>> > I think that should be 5 decimals for commercial GPS, per that link?
>> > It also suggests that "The sixth decimal place is worth up to 0.11 m:
>> > you can use this for laying out structures in detail, for designing
>> > landscapes, building roads. It should be more than good enough for
>> > tracking movements of glaciers and rivers. This can be achieved by
>> > taking painstaking measures with GPS, such as differentially corrected
>> > GPS."
>> >
>> > Do we hope to store datasets around glacier movement? It seems
>> > possible. (We don't seem to currently
>> > https://www.wikidata.org/wiki/Q770424 )
>> >
>> > I skimmed a few search results, and found 7 (or 15) decimals given in
>> > one standard, but the details are beyond my understanding:
>> > http://resources.esri.com/help/9.3/arcgisengine/java/gp_tool
>> ref/geoprocessing_environments/about_coverage_precision.htm
>> > https://stackoverflow.com/questions/1947481/how-many-signifi
>> cant-digits-should-i-store-in-my-database-for-a-gps-coordinate
>> > https://stackoverflow.com/questions/7167604/how-accurately-
>> should-i-store-latitude-and-longitude
>> >
>> >> [4]
>> >> https://gis.stackexchange.com/questions/8650/measuring-accur
>> acy-of-latitude-and-longitude
>> >
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Coordinate precision in Wikidata, RDF & query service

2017-11-05 Thread John Erling Blad

Glittertinden, a mountain in Norway have a geopos 61.651222 N 8.557492 E,
alternate geopos 6835406.62, 476558.22 (EU89, UTM32).

Some of the mountains are measured to within a millimeter in elevation. For
example Ørneflag is measured to be at 1242.808 meter, with a position
6705530.826, 537607.272 (EU89, UTM32) alternate geopos 6717133.02, 208055.24
(EU89, UTM33). This is on a bolt on the top of the mountain.
There is an on-going project to map the country withing 1x1 meter and
elevation about 0.2 meter.

One arc second is about 1 km, so five digits after decimal point should be
about 1 cm lateral precision.

Goepositions isn't a fixed thing, there can be quite large tidal waves, and
modelling and estimating them is an important research field. The waves can
be as large as ~0.3 meter. (From long ago, ask someone working on this.)
Estimation of where we are is to less than 1 cm, but I have heard better
numbers.

All geopos hould have a reference datum, without it it is pretty useless
when the precision is high. An easy fix could be to use standard profiles
with easy to recognize names, like "GPS", and limit the precision in that
case to two digits after decimal point on an arc second.

Note that precision in longitude will depend on actual latitude.

On Fri, Sep 1, 2017 at 9:43 PM, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> The GPS unit on my boat regularly claims an estimated position error of 4
> feet after it has acquired its full complement of satellites.  This is a
> fairly new mid-price GPS unit using up to nine satellites and WAAS.  So my
> recreational GPS supposedly obtains fifth-decimal-place accuracy.  It was
> running under an unobstructed sky, which is common when boating.  Careful
> use of a good GPS unit should be able to achieve this level of accuracy on
> land as well.
>
> From http://www.gps.gov/systems/gps/performance/accuracy/ the raw accuracy
> of the positioning information from a satellite is less than 2.4 feet 95%
> of
> the time.  The accuracy reported by a GPS unit is degraded by atmospheric
> conditions; false signals, e.g., bounces; and the need to determine
> position
> by intersecting the raw data from several satellites.  Accuracy can be
> improved by using more satellites and multiple frequencies and by
> comparing to a signal from a receiver at a known location.
>
> The web page above claims that accuracy can be improved to a few
> centimeters
> in real time and down to the millimeter level if a device is left in the
> same place for a long period of time.  I think that these last two
> accuracies require a close-by receiver at a known location and correspond
> to what is said in [4].
>
> peter
>
>
>
> On 08/30/2017 06:53 PM, Nick Wilson (Quiddity) wrote:
> > On Tue, Aug 29, 2017 at 2:13 PM, Stas Malyshev 
> wrote:
> >> [...] Would four decimals
> >> after the dot be enough? According to [4] this is what commercial GPS
> >> device can provide. If not, why and which accuracy would be appropriate?
> >>
> >
> > I think that should be 5 decimals for commercial GPS, per that link?
> > It also suggests that "The sixth decimal place is worth up to 0.11 m:
> > you can use this for laying out structures in detail, for designing
> > landscapes, building roads. It should be more than good enough for
> > tracking movements of glaciers and rivers. This can be achieved by
> > taking painstaking measures with GPS, such as differentially corrected
> > GPS."
> >
> > Do we hope to store datasets around glacier movement? It seems
> > possible. (We don't seem to currently
> > https://www.wikidata.org/wiki/Q770424 )
> >
> > I skimmed a few search results, and found 7 (or 15) decimals given in
> > one standard, but the details are beyond my understanding:
> > http://resources.esri.com/help/9.3/arcgisengine/java/gp_
> toolref/geoprocessing_environments/about_coverage_precision.htm
> > https://stackoverflow.com/questions/1947481/how-many-
> significant-digits-should-i-store-in-my-database-for-a-gps-coordinate
> > https://stackoverflow.com/questions/7167604/how-
> accurately-should-i-store-latitude-and-longitude
> >
> >> [4]
> >> https://gis.stackexchange.com/questions/8650/measuring-
> accuracy-of-latitude-and-longitude
> >
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Encoders/feature extractors for neural nets

2017-10-02 Thread John Erling Blad

You might view (my) problem as an embedding for words (and its fragments)
driven by valued statements (those you discard), and then inverting this
(learned encoder) into a language model. Thus when describing an object it
would be possible to chose better words (lexical choice in natural language
generation).

On Mon, Oct 2, 2017 at 5:00 PM,  wrote:

> I have done some work on converting Wikidata items and properties to a
> low-dimensional representation (graph embedding).
>
> A webservice with a "most-similar" functionality based on computation in
> the low-dimensional space is running from https://tools.wmflabs.org/wemb
> edder/most-similar/
>
> A query may look like:
>
> https://tools.wmflabs.org/wembedder/most-similar/Q20#language=en
>
> It is based on a simple Gensim model https://github.com/fnielsen/wembedder
> and could probably be improved.
>
> It is described in http://www2.imm.dtu.dk/pubdb/v
> iews/edoc_download.php/7011/pdf/imm7011.pdf
>
> It is not embedding statements but rather individual items.
>
>
> There is general research on graph embedding. I have added some of the
> scientific articles to Wikidata. You can see them with Scholia:
>
> https://tools.wmflabs.org/scholia/topic/Q32081746
>
>
> best regards
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> On 09/27/2017 02:14 PM, John Erling Blad wrote:
>
>> The most important thing for my problem would be to encode quantity and
>> geopos. The test case is lake sizes to encode proper localized descriptions.
>>
>> Unless someone already have a working solution I would encode this as
>> sparse logarithmic vectors, probably also with log of pairwise differences.
>>
>> Encoding of qualifiers is interesting, but would require encoding of a
>> topic map, and that adds an additional layer of complexity.
>>
>> How to encode the values are not so much the problem, but avoiding
>> reimplementing this yet another time… ;)
>>
>> On Wed, Sep 27, 2017 at 1:23 PM, Thomas Pellissier Tanon <
>> tho...@pellissier-tanon.fr <mailto:tho...@pellissier-tanon.fr>> wrote:
>>
>> Just an idea of a very sparse but hopefully not so bad encoding (I
>> have not actually tested it).
>>
>> NB: I am going to use a lot the terms defined in the glossary [1].
>>
>> A value could be encoded by a vector:
>> - for entity ids it is a vector V that have the dimension of the
>> number of existing entities such that V[q] = 1 if, and only if, it
>> is the entity q and V[q] = 0 if not.
>> - for time : a vector with year, month, day, hours, minutes,
>> seconds, is_precision_year, is_precision_month, ..., is_gregorian,
>> is_julian (or something similar)
>> - for geo coordinates latitude, longitude, is_earth, is_moon...
>> - string/language strings: an encoding depending on your use case
>> ...
>> Example : To encode "Q2" you would have the vector {0,1,0}
>> To encode the year 2000 you would have {2000,0...,
>> is_precision_decade =
>> 0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}
>>
>> To encode a snak you build a big vector by concatenating the vector
>> of the value if it is P1, if it is P2... (you use the property
>> datatype to pick a good vector shape) + you add two cells per
>> property to encode is_novalue, is_somevalue. To encode "P31: Q5" you
>> would have a vector V = {0,,0,0,0,0,1,0,} with 1 only for
>>  V[P31_offset + Q5_offset]
>>
>> To encode a claim you could concatenate the main snak vector + the
>> qualifiers vectors that is the merge of the snak vector for all
>> qualifiers (i.e. you build the vector for all snak and you sum them)
>> such that the qualifier vectors encode all qualifiers at the same
>> time. it allows to check that a qualifiers is set just by picking
>> the right cell in the vector. But it will do bad things if there are
>> two qualifiers with the same property and having a datatype like
>> time or geocoordinates. But I don't think it really a problem.
>> Example: to encode the claim with "P31: Q5" main snak and qualifiers
>> "P42: Q42, P42: Q44" we would have a vector V such that V[P31_offset
>> + Q5_offset] = 1, V[qualifiers_offset + P42_offset + Q42_offset] = 1
>> and V[qualifiers_offset + P42_offset + Q44_offset] = 1 and 0
>> elsewhere.
>>
>> I am not sure how to encode statements references (merge all of them
>> and encode it just like the qualifiers vecto

Re: [Wikidata] Encoders/feature extractors for neural nets

2017-10-02 Thread John Erling Blad

A "watercourse" is more related to the countryside than the ocean, even if
we tend to associate it with its destination.

Anyway, from the paper "One should not expect Wembedder to perform at the
state of the art level, and a comparison with the Wordsim-353 dataset for
semantic relatedness evaluation shows poor performance with Pearson and
Spearman correlations on just 0.13."

It is interesting anyhow.

On Tue, Oct 3, 2017 at 3:03 AM, Thad Guidry  wrote:

> Similar how ?
> The training seems to have made
> a 
> few wrong assumptions
> , but I might be wrong since I don't know your assumptions while
> training.
>
> 
> https://tools.wmflabs.org/wembedder/most-similar/Q355304#language=en
>
> You miss "ocean" on this, and pickup "farmhouse", for instance
>  ?
>
> Do Wikipedia Categories or 'subclass of' affect anything here ?
> 
>
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad

The most important thing for my problem would be to encode quantity and
geopos. The test case is lake sizes to encode proper localized descriptions.

Unless someone already have a working solution I would encode this as
sparse logarithmic vectors, probably also with log of pairwise differences.

Encoding of qualifiers is interesting, but would require encoding of a
topic map, and that adds an additional layer of complexity.

How to encode the values are not so much the problem, but avoiding
reimplementing this yet another time… ;)

On Wed, Sep 27, 2017 at 1:23 PM, Thomas Pellissier Tanon <
tho...@pellissier-tanon.fr> wrote:

> Just an idea of a very sparse but hopefully not so bad encoding (I have
> not actually tested it).
>
> NB: I am going to use a lot the terms defined in the glossary [1].
>
> A value could be encoded by a vector:
> - for entity ids it is a vector V that have the dimension of the number of
> existing entities such that V[q] = 1 if, and only if, it is the entity q
> and V[q] = 0 if not.
> - for time : a vector with year, month, day, hours, minutes, seconds,
> is_precision_year, is_precision_month, ..., is_gregorian, is_julian (or
> something similar)
> - for geo coordinates latitude, longitude, is_earth, is_moon...
> - string/language strings: an encoding depending on your use case
> ...
> Example : To encode "Q2" you would have the vector {0,1,0}
> To encode the year 2000 you would have {2000,0..., is_precision_decade =
> 0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}
>
> To encode a snak you build a big vector by concatenating the vector of the
> value if it is P1, if it is P2... (you use the property datatype to pick a
> good vector shape) + you add two cells per property to encode is_novalue,
> is_somevalue. To encode "P31: Q5" you would have a vector V =
> {0,,0,0,0,0,1,0,} with 1 only for  V[P31_offset + Q5_offset]
>
> To encode a claim you could concatenate the main snak vector + the
> qualifiers vectors that is the merge of the snak vector for all qualifiers
> (i.e. you build the vector for all snak and you sum them) such that the
> qualifier vectors encode all qualifiers at the same time. it allows to
> check that a qualifiers is set just by picking the right cell in the
> vector. But it will do bad things if there are two qualifiers with the same
> property and having a datatype like time or geocoordinates. But I don't
> think it really a problem.
> Example: to encode the claim with "P31: Q5" main snak and qualifiers "P42:
> Q42, P42: Q44" we would have a vector V such that V[P31_offset + Q5_offset]
> = 1, V[qualifiers_offset + P42_offset + Q42_offset] = 1 and
> V[qualifiers_offset + P42_offset + Q44_offset] = 1 and 0 elsewhere.
>
> I am not sure how to encode statements references (merge all of them and
> encode it just like the qualifiers vector is maybe a first step but is bad
> if we have multiple references).  For the rank you just need 3 booleans
> is_preferred, is_normal and is_deprecated.
>
> Cheers,
>
> Thomas
>
> [1] https://www.wikidata.org/wiki/Wikidata:Glossary
>
>
> > Le 27 sept. 2017 à 12:41, John Erling Blad  a écrit :
> >
> > Is there anyone that has done any work on how to encode statements as
> features for neural nets? I'm mostly interested in sparse encoders for
> online training of live networks.
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad

Is there anyone that has done any work on how to encode statements as
features for neural nets? I'm mostly interested in sparse encoders for
online training of live networks.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Stakeholders

2017-08-11 Thread John Erling Blad

Has it been created any stakeholder group for this project, and who are the
members? It is now several years since I pointed out that WMDE can't both
deliver the software and be stakeholders at the same time. In a stakeholder
group there must be people form both the data providers (repo, Wikidata)
and data consumers (clients, Wikipedia, etc).

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad

The API console for extraction of statistics is separate from use of
JSONstat, but it can export JSONstat. Statistics Norway have a user manual
for how to use the console.[2][1] This is a common collaboration between
several Scandinavian census bureaus.

[1]
http://ssb.no/en/omssb/tjenester-og-verktoy/api/_attachment/248250?_ts=15b48207778
[2] http://ssb.no/en/omssb/tjenester-og-verktoy/api

On Tue, May 2, 2017 at 4:00 PM, John Erling Blad  wrote:

> One of the most important use of this is probably JSONstat, but note that
> it is _not_ obvious which categories maps to which items, or that a
> category maps to the same item at all. For example "Oslo" might be used as
> the name of the city in Norway in some contexts, while it might be the name
> of the county in other contexts, or it might be the unincorporated
> community in Southern Florida.
>
> If a tag function is made to make and reapply queries to the API-console
> now in use at several census bureaus, then the JSONstat for specific data
> can be automatically updated. This will create some additional problems, as
> those stats can't be manually updated.
>
> Yes I have done some experiments on this, no it has not been possible to
> get this up and running for various reasons. (There must be a working cache
> with high availability, the json-lib in Scribunto is flaky, etc.)
>
> On Tue, May 2, 2017 at 3:47 PM, John Erling Blad  wrote:
>
>> You know that this has pretty huge implications for the data model, and
>> that data stored in a tabular file might invalidate the statement where it
>> is referenced? And both the statement and the data file might be valid in
>> isolation? (It is two valid propositions but from different worlds.)
>>
>> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell  wrote:
>>
>>> Interesting, thanks! I have been waiting for more developments on this
>>> since it was shown by User:TheDJ at the developer's showcase in january
>>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>>> I was wondering if this could be used in the case of painting items
>>> being linked to old art sale catalogs. So instead of bothering with
>>> wikisource, no matter what language the catalog is in I could link to a
>>> catalog entry on commons by line and column (theoretically two columns: one
>>> column for catalog identifier, and second columns for full catalog entry,
>>> generally less than 300 characters of text).
>>>
>>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix 
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> We’ve been working on a new data type that allows you to link to the 
>>>> *tabular
>>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that
>>>> are now stored on Commons. This data type will be deployed on Wikidata on 
>>>> *May
>>>> 15th*.
>>>>
>>>> The property creators will be able to create properties with this
>>>> tabular data type by selecting “tabular data” in the data type list.
>>>>
>>>> When the property is created, you can use it in statements, and when
>>>> filling the value, if you start typing a string, you can choose the name of
>>>> a file in the list of what exists on Commons.
>>>>
>>>> Before the deployment, you can test it on http://test.wikidata.org (
>>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>>
>>>> One thing to note: We currently do not export statements that use this
>>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>>> Service. The reason is that we are still waiting for tabular data files to
>>>> get stable URIs. This is handled in this ticket
>>>> <https://phabricator.wikimedia.org/T161527>.
>>>> If you have any question, feel free to ask!
>>>>
>>>> --
>>>> Léa Lacroix
>>>> Project Manager Community Communication for Wikidata
>>>>
>>>> Wikimedia Deutschland e.V.
>>>> Tempelhofer Ufer 23-24
>>>> 10963 Berlin
>>>> www.wikimedia.de
>>>>
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>>
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad

A statement in a world might be valid and true in that world, but false in
another world. Two statements, each in its own world might use a name
(parameter or instantiated) that is similar by accident, and chained
together they will form an invalid statement.

If values from the data files are available in the same RDF without
preventive measures (aka same world) it will create problems. This is well
known in logic, so it should not be a surprise. There are several proposed
solutions for RDF-graphs, but it gets slightly (a lot) more complex.

On Tue, May 2, 2017 at 3:56 PM, Jane Darnell  wrote:

> Not sure what you mean - if the datafile stored on commons links in turn
> to the source (e.g. the sale catalog hosted somewhere) then the datafile
> only acts as a transformation engine enabling blobbish text to be accessed
> as citable material for machine reading bots. Seems nifty to me.
>
> On Tue, May 2, 2017 at 3:47 PM, John Erling Blad  wrote:
>
>> You know that this has pretty huge implications for the data model, and
>> that data stored in a tabular file might invalidate the statement where it
>> is referenced? And both the statement and the data file might be valid in
>> isolation? (It is two valid propositions but from different worlds.)
>>
>> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell  wrote:
>>
>>> Interesting, thanks! I have been waiting for more developments on this
>>> since it was shown by User:TheDJ at the developer's showcase in january
>>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>>> I was wondering if this could be used in the case of painting items
>>> being linked to old art sale catalogs. So instead of bothering with
>>> wikisource, no matter what language the catalog is in I could link to a
>>> catalog entry on commons by line and column (theoretically two columns: one
>>> column for catalog identifier, and second columns for full catalog entry,
>>> generally less than 300 characters of text).
>>>
>>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix 
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> We’ve been working on a new data type that allows you to link to the 
>>>> *tabular
>>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that
>>>> are now stored on Commons. This data type will be deployed on Wikidata on 
>>>> *May
>>>> 15th*.
>>>>
>>>> The property creators will be able to create properties with this
>>>> tabular data type by selecting “tabular data” in the data type list.
>>>>
>>>> When the property is created, you can use it in statements, and when
>>>> filling the value, if you start typing a string, you can choose the name of
>>>> a file in the list of what exists on Commons.
>>>>
>>>> Before the deployment, you can test it on http://test.wikidata.org (
>>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>>
>>>> One thing to note: We currently do not export statements that use this
>>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>>> Service. The reason is that we are still waiting for tabular data files to
>>>> get stable URIs. This is handled in this ticket
>>>> <https://phabricator.wikimedia.org/T161527>.
>>>> If you have any question, feel free to ask!
>>>>
>>>> --
>>>> Léa Lacroix
>>>> Project Manager Community Communication for Wikidata
>>>>
>>>> Wikimedia Deutschland e.V.
>>>> Tempelhofer Ufer 23-24
>>>> 10963 Berlin
>>>> www.wikimedia.de
>>>>
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>>
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad

One of the most important use of this is probably JSONstat, but note that
it is _not_ obvious which categories maps to which items, or that a
category maps to the same item at all. For example "Oslo" might be used as
the name of the city in Norway in some contexts, while it might be the name
of the county in other contexts, or it might be the unincorporated
community in Southern Florida.

If a tag function is made to make and reapply queries to the API-console
now in use at several census bureaus, then the JSONstat for specific data
can be automatically updated. This will create some additional problems, as
those stats can't be manually updated.

Yes I have done some experiments on this, no it has not been possible to
get this up and running for various reasons. (There must be a working cache
with high availability, the json-lib in Scribunto is flaky, etc.)

On Tue, May 2, 2017 at 3:47 PM, John Erling Blad  wrote:

> You know that this has pretty huge implications for the data model, and
> that data stored in a tabular file might invalidate the statement where it
> is referenced? And both the statement and the data file might be valid in
> isolation? (It is two valid propositions but from different worlds.)
>
> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell  wrote:
>
>> Interesting, thanks! I have been waiting for more developments on this
>> since it was shown by User:TheDJ at the developer's showcase in january
>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>> I was wondering if this could be used in the case of painting items being
>> linked to old art sale catalogs. So instead of bothering with wikisource,
>> no matter what language the catalog is in I could link to a catalog entry
>> on commons by line and column (theoretically two columns: one column for
>> catalog identifier, and second columns for full catalog entry, generally
>> less than 300 characters of text).
>>
>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix 
>> wrote:
>>
>>> Hello all,
>>>
>>> We’ve been working on a new data type that allows you to link to the 
>>> *tabular
>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that are
>>> now stored on Commons. This data type will be deployed on Wikidata on *May
>>> 15th*.
>>>
>>> The property creators will be able to create properties with this
>>> tabular data type by selecting “tabular data” in the data type list.
>>>
>>> When the property is created, you can use it in statements, and when
>>> filling the value, if you start typing a string, you can choose the name of
>>> a file in the list of what exists on Commons.
>>>
>>> Before the deployment, you can test it on http://test.wikidata.org (
>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>
>>> One thing to note: We currently do not export statements that use this
>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>> Service. The reason is that we are still waiting for tabular data files to
>>> get stable URIs. This is handled in this ticket
>>> <https://phabricator.wikimedia.org/T161527>.
>>> If you have any question, feel free to ask!
>>>
>>> --
>>> Léa Lacroix
>>> Project Manager Community Communication for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad

You know that this has pretty huge implications for the data model, and
that data stored in a tabular file might invalidate the statement where it
is referenced? And both the statement and the data file might be valid in
isolation? (It is two valid propositions but from different worlds.)

On Tue, May 2, 2017 at 12:33 PM, Jane Darnell  wrote:

> Interesting, thanks! I have been waiting for more developments on this
> since it was shown by User:TheDJ at the developer's showcase in january
> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
> I was wondering if this could be used in the case of painting items being
> linked to old art sale catalogs. So instead of bothering with wikisource,
> no matter what language the catalog is in I could link to a catalog entry
> on commons by line and column (theoretically two columns: one column for
> catalog identifier, and second columns for full catalog entry, generally
> less than 300 characters of text).
>
> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix 
> wrote:
>
>> Hello all,
>>
>> We’ve been working on a new data type that allows you to link to the *tabular
>> data files * that are
>> now stored on Commons. This data type will be deployed on Wikidata on *May
>> 15th*.
>>
>> The property creators will be able to create properties with this tabular
>> data type by selecting “tabular data” in the data type list.
>>
>> When the property is created, you can use it in statements, and when
>> filling the value, if you start typing a string, you can choose the name of
>> a file in the list of what exists on Commons.
>>
>> Before the deployment, you can test it on http://test.wikidata.org (
>> example ).
>>
>> One thing to note: We currently do not export statements that use this
>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>> Service. The reason is that we are still waiting for tabular data files to
>> get stable URIs. This is handled in this ticket
>> .
>> If you have any question, feel free to ask!
>>
>> --
>> Léa Lacroix
>> Project Manager Community Communication for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikidata-tech] Script and API module for constraint checks

2017-04-27 Thread John Erling Blad

Good idea, but I wonder if this exposes users for the very alien language
in the constraint reports.
It is often very difficult to understand what the error is about, not to
mention how to fix it.

On Fri, Apr 28, 2017 at 12:28 AM, Stas Malyshev 
wrote:

> Hi!
>
> > That’s a bug in the constraints on Wikidata – “date of birth” has a
> > constraint stating that its value must be at least 30 years away from
> > the “date of birth” value. We’ll work on resolving this (I contacted
> > Ivan Krestinin, who added this “experimental constraint”, to ask if he
> > still needs it – and if it can’t be removed from the P569 talk page for
> > some reason, we’ll probably filter it out in the user script).
>
> The checks seem to be against P184/P185, of which neither is on
> Q5066005. So I suspect there's some bug still somewhere.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad

Only using sitelinks as a weak indication of quality seems correct to me.
Also the idea that some languages are more important than other, and some
large languages are more important than other. I would really like it if
the reasoning behind the classes and the features could be spelled out.

I have serious issues with the ORES training sets, but that is another
discussion. ;/ (There is a lot of similar bot edits in the sets, and that
will train a bot-detector, which is not what we need! Grumpf…)

On Wed, Mar 22, 2017 at 3:33 PM, Aaron Halfaker 
wrote:

> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com> wrote:
>
>> Hoi,
>> What I have read is that it will be individual items that are graded. That
>> is not what helps you determine what items are lacking in something. When
>> you want to determine if something is lacking you need a relational
>> approach. When you approach a award like this one [1], it was added to
>> make
>> the award for a person [2] more complete. No real importance is given to
>> this award, just a few more people were added because they are part of a
>> group that gets more attention from me [3]. For yet another award [4], I
>> added all the people who received the award because I was told by
>> someone's
>> expert opinion that they were all notable (in the Wikipedia sense of the
>> word). I added several of these people in Wikidata. Arguably, the Wikidata
>> the quality for the item for the award is great but it has no article
>> associated to it in Wikipedia but that has nothing to do with the quality
>> of the information it provides. It is easy and obvious to recognise in one
>> level deeper that quality issues arise; the info for several people is
>> meagre at best.You cannot deny their relevance though; removing them
>> destroys the quality for the award.
>>
>> The point is that in relations you can describe quality, in the grading
>> that is proposed there is nothing really that is actionable.
>>
>> When you add links to the mix, these same links have no bearing on the
>> quality of the Wikidata item. Why would it? Links only become interesting
>> when you compare the statements in Wikidata with the links to other
>> articles in the same Wikipedia. This is not what this approach brings.
>>
>> Really, how will the grades to items make a difference. How will it help
>> us
>> understand that "items relating to railroads are lacking"? It does not.
>>
>> When you want to have indicators for quality; here is one.. an author (and
>> its subclasses) should have a VIAF identifier. An artist with objects in
>> the Getty Museum should have an ULAN number. The lack of such information
>> is actionable. The number of interwiki links is not, the number of
>> statements are not and even references are not that convincing.
>> Thanks,
>>   GerardM
>>
>> [1] https://tools.wmflabs.org/reasonator/?&q=29000734
>> [2] https://tools.wmflabs.org/reasonator/?&q=7315382
>> [3] https://tools.wmflabs.org/reasonator/?&q=3308284
>> [4] https://tools.wmflabs.org/reasonator/?&q=28934266
>>
>> On 22 March 2017 at 11:56, Lydia Pintscher 
>> wrote:
>>
>> > On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
>> >  wrote:
>> > > In your reply I find little argument why this approach is useful. I do
>> > not
>> > > find a result that is actionable. There is little point to this
>> approach
>> > > and it does not fit with well with much of the Wikidata practice.
>> >
>> > Gerard, the outcome will be very actionable. We will have the
>> > groundwork needed to identify individual items and sets of items that
>> > need improvement. If it for example turns out that our items related
>> > to railroads are particularly lacking then that is something we can
>> > concentrate on if we so chose. We can do editathons, data
>> > partnerships, quality drives and and and.
>> >
>> >
>> > Cheers
>> > Lydia
>> >
>> > --
>> > Lydia Pintscher - http://about.me/lydia.pintscher
>> > Product Manager for Wikidata
>> >
>> > Wikimedia Deutschland e.V.
>> > Tempelhofer Ufer 23-24
>> > 10963 Berlin
>> > www.wikimedia.de
>> >
>> >

Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad

Forgot to mention; this is not really about _quality_, Gerrard says model
of quality, it is about _trust_ and _reputation_. Something can have low
quality and high trust, ref cheap cell phones, and the reputation might not
reflect the actual quality.

You (usually) measure reputation and calculate trust, but I have seen it
the other way around. The end result is the same anyhow.

On Wed, Mar 22, 2017 at 3:31 PM, John Erling Blad  wrote:

> Sitelinks to an item are an approximation of the number of views of the
> data from an item, and as such gives an approximation to the likelihood of
> detecting an error. Few views imply a larger time span before an error is
> detected. It is really about estimating quality as a function of the age of
> the item as number of page views, but approximated through sitelinks.
>
> Problem is, the number of sitelinks is not a good approximation. Yes it is
> a simple approximation, but it is still pretty bad.
>
> References are an other way to verify the data, but that is not a valid
> argument against measuring the age of the data.
>
> I've been toying with an idea for some time that use statistical inference
> to try to identify questionable facts, but it will probably not be done -
> it is way to much work to do in spare time.
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad

Sitelinks to an item are an approximation of the number of views of the
data from an item, and as such gives an approximation to the likelihood of
detecting an error. Few views imply a larger time span before an error is
detected. It is really about estimating quality as a function of the age of
the item as number of page views, but approximated through sitelinks.

Problem is, the number of sitelinks is not a good approximation. Yes it is
a simple approximation, but it is still pretty bad.

References are an other way to verify the data, but that is not a valid
argument against measuring the age of the data.

I've been toying with an idea for some time that use statistical inference
to try to identify questionable facts, but it will probably not be done -
it is way to much work to do in spare time.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Upload mathematical formulas in Wikidata

2016-09-21 Thread John Erling Blad

I was looking at the math formulas yesterday, and it seems to me that if
this should be useful then it must include the proof. That is a problem as
a proof is a sequence of steps. How should we do that? And how do we
describe the transition from one step to the next? The description of the
transitions are in between the steps, and they are multilingual.

On Wed, Sep 21, 2016 at 5:53 PM, kaushal dudhat 
wrote:

> Hello,
>
> My name is Kaushal and currently I and a friend are working on AskPlatypus
> as part of our master thesis. We want to add a module to AskPlatypus which
> answers mathematical questions with the use of Wikidata. As a first step we
> want to add more mathematical formulas to Wikidata. We extracted a lot of
> them from Wikipedia. There are 17838 formulas now. It would be great to get
> them uploaded into primary source tool.
> The list of formulas in primary source tool syntax is attached here.
>
> Please have look. It would be great if someone could upload them into the
> primary sources tool.
>
>
> Greetings
> Kaushal
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Dynamic Lists, Was: Re: List generation input

2016-09-19 Thread John Erling Blad

Either the list should just be a an entry point for a list structure or
table that is completely created outside the editors realm, or it should be
possible to merge any user edit with content from the bot. There should be
no in-between where the user needs additional knowledge about how to edit
the bot-produced content or even that (s)he can't edit the bot-produced
content. From the user (editors) point of view there should be no special
precautions to how some pages should be edited.

At the moment there are two pages in the main space at nowiki using
listeria bot; Bluepoint Games [1] and Thatgamecompany [2]. It is a (weak?)
consensus on not using the bot, so if a discussion is started they will
probably be removed. The main argument why it should not be used is because
it overwrites edits made by other users.

[1] https://no.wikipedia.org/wiki/Bluepoint_Games
[2] https://no.wikipedia.org/wiki/Thatgamecompany
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] (Ab)use of "deprecated"

2016-08-12 Thread John Erling Blad

A last note; listen to Markus, he is usually right.
Darn! 😤

On Fri, Aug 12, 2016 at 12:02 PM, John Erling Blad  wrote:

> Latest date for population isn't necessarily the preferred one, it can be
> a predicted one for a short timespan. For example Statistics Norway provide
> a 3 month expectation in addition to the one year stats. The one year stats
> should be the preferred ones, the 3 month stats are kind of expected change
> on last years stats.
>
> Main problem with the 3 month stats are that they usually can't be used
> together with one-year stats, ie. they can't be normalized against the same
> base. Absolute value would seem the same, but growt rate against a one-year
> base would be wrong. It is a quite usual to do that error.
>
> A lot of stats "sounds similar" but isn't similar. It is a bit awkward.
> Sometimes stats refer to international standards for how they should be
> made, in those cases they can be compared. It is often described on a page
> for metadata about the stats. An example is population in rural areas,
> which many assume is the same in all countries. It is not.
>
> And while I'm on it; stats often describe a (possibly temporal) connection
> or relation between two or more (types of) subjects, and it is not
> something you should assign to one of the subject. If one part is a
> concrete instance then it makes sense to add stats about the other types to
> that item, like population for a municipality, but otherwise it could be
> wrong.
>
> In general, setting the last added or most recent value to preferred is in
> general wrong.
>
> And also, that something is not-preferred does not imply that it is
> deprecated. And also note the difference between deprecated and deferred.
>
> On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev 
> wrote:
>
>> Hi!
>>
>> > I would argue that this is better done by using qualifiers (e.g. start
>> > data, end data).  If a statement on the population size would be set to
>> > preferred, but isn't monitored for quite some time, it can be difficult
>> > to see if the "preferred" statement is still accurate, whereas a
>> > qualifier would give a better indication that that stament might need an
>> > update.
>>
>> Right now this bot:
>> https://www.wikidata.org/wiki/User:PreferentialBot
>> watches statements like "population" that have multiple values with
>> different time qualifiers but no current preference.
>>
>> What it doesn't currently do is to verify that the preferred one refers
>> to the latest date. It probably shouldn't fix these cases (because there
>> may be valid cause why the latest is not the best, e.g. some population
>> estimates are more precise than others) but it can alert about it. This
>> can be added if needed.
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] (Ab)use of "deprecated"

2016-08-12 Thread John Erling Blad

Latest date for population isn't necessarily the preferred one, it can be a
predicted one for a short timespan. For example Statistics Norway provide a
3 month expectation in addition to the one year stats. The one year stats
should be the preferred ones, the 3 month stats are kind of expected change
on last years stats.

Main problem with the 3 month stats are that they usually can't be used
together with one-year stats, ie. they can't be normalized against the same
base. Absolute value would seem the same, but growt rate against a one-year
base would be wrong. It is a quite usual to do that error.

A lot of stats "sounds similar" but isn't similar. It is a bit awkward.
Sometimes stats refer to international standards for how they should be
made, in those cases they can be compared. It is often described on a page
for metadata about the stats. An example is population in rural areas,
which many assume is the same in all countries. It is not.

And while I'm on it; stats often describe a (possibly temporal) connection
or relation between two or more (types of) subjects, and it is not
something you should assign to one of the subject. If one part is a
concrete instance then it makes sense to add stats about the other types to
that item, like population for a municipality, but otherwise it could be
wrong.

In general, setting the last added or most recent value to preferred is in
general wrong.

And also, that something is not-preferred does not imply that it is
deprecated. And also note the difference between deprecated and deferred.

On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev 
wrote:

> Hi!
>
> > I would argue that this is better done by using qualifiers (e.g. start
> > data, end data).  If a statement on the population size would be set to
> > preferred, but isn't monitored for quite some time, it can be difficult
> > to see if the "preferred" statement is still accurate, whereas a
> > qualifier would give a better indication that that stament might need an
> > update.
>
> Right now this bot:
> https://www.wikidata.org/wiki/User:PreferentialBot
> watches statements like "population" that have multiple values with
> different time qualifiers but no current preference.
>
> What it doesn't currently do is to verify that the preferred one refers
> to the latest date. It probably shouldn't fix these cases (because there
> may be valid cause why the latest is not the best, e.g. some population
> estimates are more precise than others) but it can alert about it. This
> can be added if needed.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Grammatical display of units

2016-07-30 Thread John Erling Blad

Norwegian have a lot of colloquialisms that must be handled if you want the
language to sound natural. The example with "kilo" exists in a lot of
languages in one form or another.
Then you have congruence on external factors (direction, length,
emptyness), missing plurals for some units (Norwegian mil is one example), …

On Sat, Jul 30, 2016 at 5:58 AM, Jan Macura  wrote:

> Hi John, all
>
> 2016-07-29 15:54 GMT+02:00 John Erling Blad :
>
>> In general this has more implications than simple singular/plural forms
>> of units. Agreement/concord/congruence is the proper term. In some
>> language you will even change the form given the distance to the thing you
>> are measuring or counting, even depending on the type of thing you are
>> measuring or counting, or change on the gender of the thing, and then even
>> only for some numbers.
>>
>
> Linguistic agreement is common in a lot of inflected languages [1].
>
> Now assume "kilogram" is changed to the short form "kilo", then it is "én
>> kilo" which is masculinum. The prefix "kilo" is only used for "kilogram",
>> so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or
>> "én milli" when refering to "1 milligram".
>
>
> On the other hand, we don't have to deal with colloquialisms like "kilo"
> in your example. Modelling the formal language would be still hard enough.
>
> Best,
>  Jan
>
> [1] https://en.wikipedia.org/wiki/Fusional_language
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Grammatical display of units

2016-07-29 Thread John Erling Blad

In general this has more implications than simple singular/plural forms of
units. Agreement/concord/congruence is the proper term. [1] In some
language you will even change the form given the distance to the thing you
are measuring or counting, even depending on the type of thing you are
measuring or counting, or change on the gender of the thing, and then even
only for some numbers.

Assume you have "1 meter", then you could write it out as "én meter" in
Norwegian as "meter" is masculinum. Now assume you have "1 kilogram", then
you would write it out as "ett kilogram" as "gram" is neutrum. Now assume
"kilogram" is changed to the short form "kilo", then it is "én kilo" which
is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't
valid Norwegian til say "én kilo" when referring to "1 km", or "én milli"
when refering to "1 milligram".

[1] https://en.wikipedia.org/wiki/Agreement_(linguistics)

On Fri, Jul 29, 2016 at 7:26 AM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 28.07.2016 20:41, Stas Malyshev wrote:
>
>> Hi!
>>
>> Good point. Could we not just have a monolingual text string property
>>> that gives the preferred writing of the unit when used after a number? I
>>> don't think the plural/singular issue is very problematic, since you
>>> would have plural almost everywhere, even for "1.0 metres". So maybe we
>>>
>>
>> We have code to deal with that - note that "1 reference" and "2
>> references" are displayed properly. It's a matter of applying that code
>> and having it provided with proper configs.
>>
>
> You mean the MediaWiki message processing code? This would probably be
> powerful enough for units as well, but it works based on message strings
> that look a bit like MW template calls. Someone has to enter such strings
> for all units (and languages). This would be doable but the added power
> comes at the price of more difficult editing of such message strings
> instead of plain labels.
>
> As far as I know, the message parsing is available through the MW API, so
> external consumers could take advantage of the same system if the message
> strings were part of the data (we would like to have grammatical units in
> SQID as well).
>
>
>> just need one alternative label for most languages? Or are there
>>> languages with more complex grammar rules for units?
>>>
>>
>> Oh yes :) Russian is one, but I'm sure there are others.
>>
>>
> Forgive my ignorance; I was not able to read the example you gave there.
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] ArticlePlaceholder rolled out to next wikis

2016-06-10 Thread John Erling Blad

The page is missing a link back to the item, it is now a dead end unless
you want to create an article.
I guess that isn't quite obvious…

On Fri, Jun 10, 2016 at 2:23 PM, Magnus Manske 
wrote:

> -- Forwarded message -
> From: Lydia Pintscher 
> * Gujarati (
> https://gu.wikipedia.org/wiki/%E0%AA%B5%E0%AA%BF%E0%AA%B6%E0%AB%87%E0%AA%B7:AboutTopic/Q13520818
> )
>
> Honoured to be a test item, even if I have never heard about that language
> before... :-)
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] editing from Wikipedia and co

2016-06-04 Thread John Erling Blad

An aditional note.

The problem is that a community can handle a quite specific workload. Some
of that goes into producing new articles, some goes into patrolling. Some
goes into maintenance of existing articles. When a project has to much
dynamic content (and it will always have some dynamic content) they start
to move into a maintenance mode, because they are swamped by the dynamic
content.

A typical indication that something is going on is that the patrol log
starts to overflow. Another is that the production of new articles starts
to drop, but that will drop anyhow because of addition of new content to
old articles.[1] To get good numbers we need the factor "new
content"/"edited old content". When that number start to drop then we know
that the community starts to run into problems.

If we had unlimited sources, then we could add more workload, but we don't
have unlimited sources (aka manhours). The community is limited. Adding new
work to the existing will thus not scale very well, if at all. We need ways
to cope with the existing workload, not additional work.

In short; nice thesis, but even if it can be _implemented_ it will not
scale on Wikipedia.

And of course, someone will surly claim that we could just just get some
more members in the community. Yes, sure, some of us has been working on
that for several years.[2]

[1]
https://commons.wikimedia.org/wiki/File:Stats-nowiki-2016-05-07-new-articles.png
[2]
https://commons.wikimedia.org/wiki/File:Stats-nowiki-2016-05-07-new-users.png

On Sat, Jun 4, 2016 at 11:55 AM, John Erling Blad  wrote:

> Given Lydias post I wonder if it is to be expected that editors on
> Wikipedia shall manually import statements from Wikidata, as this is what
> can be read out of this thesis. This will create a huge backlog of work on
> all Wikipedias, and I can't see how we possibly can do this. For the moment
> we have a huge backlog on sources on nowiki, and adding a lot of additional
> manual work will not go very well with the community.
>
> What is the plan, can Lydia or some else please clarify?
>
> On Mon, May 30, 2016 at 11:43 PM, John Erling Blad 
> wrote:
>
>> Page 21, moving to manual import of statements. I would really like to
>> see the analysis written out that ends in this conclusion. It is very
>> tempting, but the idea don't scale.
>>
>> We have now about 5-10 000 articles per active user. Those users have a
>> huge backlog of missing references. If they shall manage statements in
>> addition to their current backlog, then they will simply be overwhelmed.
>>
>>
>> On Mon, May 30, 2016 at 6:05 PM, Lydia Pintscher <
>> lydia.pintsc...@wikimedia.de> wrote:
>>
>>> Hey folks :)
>>>
>>> Charlie has been working on concepts for making it possible to edit
>>> Wikidata from Wikipedia and other wikis. This was her bachelor thesis. She
>>> has now published it:
>>> https://commons.wikimedia.org/wiki/File:Facilitating_the_use_of_Wikidata_in_Wikimedia_projects_with_a_user-centered_design_approach.pdf
>>> I am very happy she put a lot of thought and work into figuring out all
>>> the complexities of the topic and how to make this understandable for
>>> editors. We still have more work to do on the concepts and then actually
>>> have to implement it. Comments welcome.
>>>
>>>
>>> Cheers
>>> Lydia
>>> --
>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>> Product Manager for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] editing from Wikipedia and co

2016-06-04 Thread John Erling Blad

Given Lydias post I wonder if it is to be expected that editors on
Wikipedia shall manually import statements from Wikidata, as this is what
can be read out of this thesis. This will create a huge backlog of work on
all Wikipedias, and I can't see how we possibly can do this. For the moment
we have a huge backlog on sources on nowiki, and adding a lot of additional
manual work will not go very well with the community.

What is the plan, can Lydia or some else please clarify?

On Mon, May 30, 2016 at 11:43 PM, John Erling Blad  wrote:

> Page 21, moving to manual import of statements. I would really like to see
> the analysis written out that ends in this conclusion. It is very tempting,
> but the idea don't scale.
>
> We have now about 5-10 000 articles per active user. Those users have a
> huge backlog of missing references. If they shall manage statements in
> addition to their current backlog, then they will simply be overwhelmed.
>
>
> On Mon, May 30, 2016 at 6:05 PM, Lydia Pintscher <
> lydia.pintsc...@wikimedia.de> wrote:
>
>> Hey folks :)
>>
>> Charlie has been working on concepts for making it possible to edit
>> Wikidata from Wikipedia and other wikis. This was her bachelor thesis. She
>> has now published it:
>> https://commons.wikimedia.org/wiki/File:Facilitating_the_use_of_Wikidata_in_Wikimedia_projects_with_a_user-centered_design_approach.pdf
>> I am very happy she put a lot of thought and work into figuring out all
>> the complexities of the topic and how to make this understandable for
>> editors. We still have more work to do on the concepts and then actually
>> have to implement it. Comments welcome.
>>
>>
>> Cheers
>> Lydia
>> --
>> Lydia Pintscher - http://about.me/lydia.pintscher
>> Product Manager for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] editing from Wikipedia and co

2016-05-30 Thread John Erling Blad

Page 21, moving to manual import of statements. I would really like to see
the analysis written out that ends in this conclusion. It is very tempting,
but the idea don't scale.

We have now about 5-10 000 articles per active user. Those users have a
huge backlog of missing references. If they shall manage statements in
addition to their current backlog, then they will simply be overwhelmed.


On Mon, May 30, 2016 at 6:05 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> Hey folks :)
>
> Charlie has been working on concepts for making it possible to edit
> Wikidata from Wikipedia and other wikis. This was her bachelor thesis. She
> has now published it:
> https://commons.wikimedia.org/wiki/File:Facilitating_the_use_of_Wikidata_in_Wikimedia_projects_with_a_user-centered_design_approach.pdf
> I am very happy she put a lot of thought and work into figuring out all
> the complexities of the topic and how to make this understandable for
> editors. We still have more work to do on the concepts and then actually
> have to implement it. Comments welcome.
>
>
> Cheers
> Lydia
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Ontology

2016-05-16 Thread John Erling Blad

There was a previous statement about an entity which is now deprecated. You
may as well add a source stating why it is deprecated.

On Sat, May 14, 2016 at 7:55 PM, Smolenski Nikola  wrote:

> Citiranje Gerard Meijssen :
> > I have stopped expecting necessary changes from Wikidata. It has been
> made
> > clear that dates will be associated with labels. By the way we can and do
> > already indicate the validity of facts on time.
>
> I can't see why would dates be associated with labels. Can someone explain?
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] StatBank at Statistics Norway will be accessible through new open API

2016-04-23 Thread John Erling Blad

At Statistics Norway (SSB) there is a service called "StatBank Norway"
("Statistikkbanken").[1][2] For some time it has been possible to access
this through an open API, serving JSON-stat.[4] Now they open up all the
remaining access and all 5000 tables will be made available.[3]

SSB use NLOD,[5][6] an open license on their published data. (I asked them
and all they really want is the source to be clearly given so to avoid
falsified data.)

[1] https://www.ssb.no/en/statistikkbanken
[2]
https://www.ssb.no/en/informasjon/om-statistikkbanken/how-to-use-statbank-norway
[3]
http://www.ssb.no/omssb/om-oss/nyheter-om-ssb/ssb-gjor-hele-statistikkbanken-tilgjengelig-som-apne-data
(Norwegian)
[4] https://json-stat.org/
[5] http://www.ssb.no/en/informasjon/copyright
[6] http://data.norge.no/nlod/en/1.0
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-05 Thread John Erling Blad

First you say that the heuristic isn't perfect, then you say that "As long
as we don't have notability criteria in a machine readable format we can
only work with heuristics." and then "And I really don't believe machine
readable notability criteria is something we should strive for." If the
heuristic isn't perfect then alternatives should be investigated. There are
already machine readable notability criterias in there, the only thing
missing is exposing them, probably by using the existing relations.

On Tue, Apr 5, 2016 at 11:32 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad  wrote:
>
>> Just read through the doc, and found some important points. I post each
>> one in a separate mail.
>>
>> > Since it is hard to decide which content is actually notable, the items
>> appear-
>> > ing in the search should be limited to the ones having at least one
>> statements
>> > and two sitelinks to the same project (like Wikipedia or Wikivoyage).
>>
>> This is a good baseline, but figuring out what is notable locally is a
>> bit more involved. A language is used in a local area, and within that area
>> some items are more important just because they reside within the area.
>> This is quite noticeable in the differences between nnwiki and nowiki which
>> both basically covers "Norway". Also items that somehow relates to the
>> local area or language is more noticeable than those outside those areas.
>> By traversing upwords in the claims using the "part of" property it is
>> possible to build a priority on the area involved. It is possible to
>> traverse "nationality" and a few other properties.
>>
>> Things directly noticeable like an area enclosed in an area using the
>> language is somewhat easy to identify, but things that are noticeable by
>> association with another noticeable thing is not. Like a Danish slave ship
>> operated by a Norwegian firm, the ship is thus noticeable in nowiki. I
>> would say that all things linked as an item from other noticeable things
>> should be included. Some would perhaps say that "items with second order
>> relevance should be included".
>>
>
> Yes the heuristic we're using isn't perfect. However I believe it is good
> enough for 99% of the cases while being really simple. This is what we need
> at the beginning. As we go along we can learn and see if other things make
> more sense.
> We have taken the exact same approach to ranking for item suggestions on
> Wikidata. At first all we took into account was the number of sitelinks on
> the items. This definitely wasn't a perfect measure for how relevant an
> item is but it was absolutely good enough while introducing very little
> complexity. As we've learned more and as Wikidata grows it was no longer
> good enough so we switched the algorithm to also take into account the
> number of labels. This is still relatively low complexity while producing
> good results.
> For the particular case of notability: As long as we don't have notability
> criteria in a machine readable format we can only work with heuristics. And
> I really don't believe machine readable notability criteria is something we
> should strive for.
>
> Cheers
> Lydia
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad

> Ordering of statement groups

The solution described (?) seems to me as "dev-ish" way to do this, and I
think it is wrong. The grouping is something that should be done
dynamically, as it depends both on the item itself (ie the knowledge base),
its class hierarchy (ie interpretation of the knowledge base, often part of
the knowledge base), our communicative goal (the overall context of the
communication), the discourse (usually we drop this, as we don't maintain
state), and the user model (which changes through a wp-article). This
4-tuple is pretty well-known in Natural Language Generation, but the
implications for reuse of Wikidata statements in Wikipedia is mostly
neglected. (That is not something Lucie should discuss in a bachelor
thesis, but it is extremely important if the goal for Wikidata is actual
reuse on Wikipedia)

That said; I tried to figure out whats the idea, and also read the RfC
(Statement group ordering [1]), but actually I don't know whats planned
here. I think I know it, but most probably I don't. The statement group
ordering is a on-wiki list of ordered groups? How do you create those
groups? What is the implications on those groups? Does it has implications
for other visualizations? What if groups should follow the type of the
item? It seems like this describe a system where "one size fits all - or
make it youself".

And not to forget, where is the discussion? An RfC with no discussion?

[1]
https://www.mediawiki.org/wiki/Requests_for_comment/Statement_group_ordering

On Sun, Apr 3, 2016 at 5:06 PM, John Erling Blad  wrote:

>
> > Red links are used frequently in Wikipedia to indicate an article which
> is does
> > not yet exist, but should. Today it leads the user to an empty create
> article page.
> > In the future it should instead bring them to an ArticlePlaceholder,
> offering the
> > option of creating an article. This is part of the topic of smart red
> links, which is
> > discussed in the section 8.1: Smart red links
>
> It should be interesting to hear if someone have an idea how this might
> work. There are some attempts on this at nowiki, none of them seems to work
> in all cases.
>
> Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really
> solve the problem for existing redlinks, it solves the association problem
> when the user tries to resolve the redlink. That is one step further down
> the line, or more like solving the redlinks for a disambiguation page. ("I
> know there is a page like this, named like so, on that specific project.")
>
> Note also that an item is not necessarily described on any project, and
> that creating an item on Wikidata can be outside the editors scope or even
> very difficult. Often we have a name of some "thing", but we only have a
> sketchy idea about the thing itself. Check out
> https://www.wikidata.org/wiki/Q12011301 for an example.
>
> It seems like a lot of what are done so far on redlinks is an attempt to
> make pink-ish links with _some_information_, while the problem is that
> redlinks have _no_information_. The core reason why we have redlinks is
> that we lacks manpower to avoid them, and because of that we can't just add
> "some information". It is not a problem of what we need first, hens or
> eggs, as we have none of them.
>
> On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad  wrote:
>
>> Just read through the doc, and found some important points. I post each
>> one in a separate mail.
>>
>> > Since it is hard to decide which content is actually notable, the items
>> appear-
>> > ing in the search should be limited to the ones having at least one
>> statements
>> > and two sitelinks to the same project (like Wikipedia or Wikivoyage).
>>
>> This is a good baseline, but figuring out what is notable locally is a
>> bit more involved. A language is used in a local area, and within that area
>> some items are more important just because they reside within the area.
>> This is quite noticeable in the differences between nnwiki and nowiki which
>> both basically covers "Norway". Also items that somehow relates to the
>> local area or language is more noticeable than those outside those areas.
>> By traversing upwords in the claims using the "part of" property it is
>> possible to build a priority on the area involved. It is possible to
>> traverse "nationality" and a few other properties.
>>
>> Things directly noticeable like an area enclosed in an area using the
>> language is somewhat easy to identify, but things that are noticeable by
>> association with another noticeable thing is not. Like a Danish slave ship
>> operated b

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad

> Red links are used frequently in Wikipedia to indicate an article which
is does
> not yet exist, but should. Today it leads the user to an empty create
article page.
> In the future it should instead bring them to an ArticlePlaceholder,
offering the
> option of creating an article. This is part of the topic of smart red
links, which is
> discussed in the section 8.1: Smart red links

It should be interesting to hear if someone have an idea how this might
work. There are some attempts on this at nowiki, none of them seems to work
in all cases.

Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really
solve the problem for existing redlinks, it solves the association problem
when the user tries to resolve the redlink. That is one step further down
the line, or more like solving the redlinks for a disambiguation page. ("I
know there is a page like this, named like so, on that specific project.")

Note also that an item is not necessarily described on any project, and
that creating an item on Wikidata can be outside the editors scope or even
very difficult. Often we have a name of some "thing", but we only have a
sketchy idea about the thing itself. Check out
https://www.wikidata.org/wiki/Q12011301 for an example.

It seems like a lot of what are done so far on redlinks is an attempt to
make pink-ish links with _some_information_, while the problem is that
redlinks have _no_information_. The core reason why we have redlinks is
that we lacks manpower to avoid them, and because of that we can't just add
"some information". It is not a problem of what we need first, hens or
eggs, as we have none of them.

On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad  wrote:

> Just read through the doc, and found some important points. I post each
> one in a separate mail.
>
> > Since it is hard to decide which content is actually notable, the items
> appear-
> > ing in the search should be limited to the ones having at least one
> statements
> > and two sitelinks to the same project (like Wikipedia or Wikivoyage).
>
> This is a good baseline, but figuring out what is notable locally is a bit
> more involved. A language is used in a local area, and within that area
> some items are more important just because they reside within the area.
> This is quite noticeable in the differences between nnwiki and nowiki which
> both basically covers "Norway". Also items that somehow relates to the
> local area or language is more noticeable than those outside those areas.
> By traversing upwords in the claims using the "part of" property it is
> possible to build a priority on the area involved. It is possible to
> traverse "nationality" and a few other properties.
>
> Things directly noticeable like an area enclosed in an area using the
> language is somewhat easy to identify, but things that are noticeable by
> association with another noticeable thing is not. Like a Danish slave ship
> operated by a Norwegian firm, the ship is thus noticeable in nowiki. I
> would say that all things linked as an item from other noticeable things
> should be included. Some would perhaps say that "items with second order
> relevance should be included".
>
>
> On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa  wrote:
>
>> On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee 
>> wrote:
>>
>>> I wrote my Bachelor's thesis on "Generating Article Placeholders from
>>> Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The
>>> thesis summarizes a lot of the work done on the ArticlePlaceholder
>>> extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder
>>> )
>>>
>>> I uploaded the thesis to commons under a CC-BY-SA license- you can find
>>> it at
>>> https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf
>>>
>>> I continue working on the extension and aim to deploy it to the first
>>> Wikipedias, that are interested, in the next months.
>>>
>>> I am happy to answer questions related to the extension!
>>>
>>
>> Great work on something that I *believe *has a lot of promise - thanks!
>> I really think this approach has a lot of promise to help take back some
>> readership from Google, and potentially in the long-run drive more new
>> editors as well. (I know that was part of the theory of LSJbot, though I
>> don't know if anyone has actually a/b tested that.)
>>
>> I was somewhat surprised to not see data collection discussed in Section
>> 8.10 - are there plans to do that? I would have expect

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad

Just read through the doc, and found some important points. I post each one
in a separate mail.

> Since it is hard to decide which content is actually notable, the items
appear-
> ing in the search should be limited to the ones having at least one
statements
> and two sitelinks to the same project (like Wikipedia or Wikivoyage).

This is a good baseline, but figuring out what is notable locally is a bit
more involved. A language is used in a local area, and within that area
some items are more important just because they reside within the area.
This is quite noticeable in the differences between nnwiki and nowiki which
both basically covers "Norway". Also items that somehow relates to the
local area or language is more noticeable than those outside those areas.
By traversing upwords in the claims using the "part of" property it is
possible to build a priority on the area involved. It is possible to
traverse "nationality" and a few other properties.

Things directly noticeable like an area enclosed in an area using the
language is somewhat easy to identify, but things that are noticeable by
association with another noticeable thing is not. Like a Danish slave ship
operated by a Norwegian firm, the ship is thus noticeable in nowiki. I
would say that all things linked as an item from other noticeable things
should be included. Some would perhaps say that "items with second order
relevance should be included".

On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa  wrote:

> On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee 
> wrote:
>
>> I wrote my Bachelor's thesis on "Generating Article Placeholders from
>> Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The
>> thesis summarizes a lot of the work done on the ArticlePlaceholder
>> extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
>>
>> I uploaded the thesis to commons under a CC-BY-SA license- you can find
>> it at
>> https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf
>>
>> I continue working on the extension and aim to deploy it to the first
>> Wikipedias, that are interested, in the next months.
>>
>> I am happy to answer questions related to the extension!
>>
>
> Great work on something that I *believe *has a lot of promise - thanks! I
> really think this approach has a lot of promise to help take back some
> readership from Google, and potentially in the long-run drive more new
> editors as well. (I know that was part of the theory of LSJbot, though I
> don't know if anyone has actually a/b tested that.)
>
> I was somewhat surprised to not see data collection discussed in Section
> 8.10 - are there plans to do that? I would have expected to see a/b testing
> discussed as part of the deployment methodology, so that it could be
> compared both to the current baseline and also to similar approaches (like
> the ones you survey in Section 3).
>
> Thanks again for the hard work here-
>
> Luis
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Other sites

2016-02-03 Thread John Erling Blad

Is there any documentation on how Commons is handled? Especially how
additional links to gallery/category is handled if sitelinks are added to
the opposite page? That is there is a sitelink to the gallery, what happen
then with the category link. Do we reimplement the additional link with
javascript?

On Wed, Feb 3, 2016 at 2:47 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Wed, Feb 3, 2016 at 1:48 PM Emilio J. Rodríguez-Posada <
> emi...@gmail.com> wrote:
>
>> Hello;
>>
>> What sites are allowed in the item "other sites" section? I haven't found
>> documentation about it.
>>
>
> Currently MediaWiki, Wikispecies, Meta, Wikidata and Commons are in this
> section.
>
>
>> I would suggest to allow links to other wikis in the Internet, that way
>> we could create a network of wikis, although I know that some wikis are
>> controversial and don't follow the Wikipedia guidelines.
>>
>
> Links to sites outside Wikimedia are handled via statements.
>
>
> Cheers
> Lydia
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] upcoming deployments/features

2016-02-03 Thread John Erling Blad

It is a bit strange to defines a data type in terms of a library of
functions in another language.
Or is just me that thinks this is a bit odd?

What about MathML?

On Wed, Feb 3, 2016 at 12:06 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> For a consumer, the main practical questions would be:
>
> (1) What subset of LaTeX exactly do you need to support to display the
> math expressions in Wikidata?
> (2) As a follow up: does MathJAX work to display this? If not, what does?
>
> Cheers,
>
> Markus
>
> On 02.02.2016 10:01, Moritz Schubotz wrote:
>
>> The string is interpreted by the math extension in the same way as the
>> Math extension interprets the text between the  tags.
>> There is an API to extract identifiers and the packages required to
>> render the input with regular latex from here:
>> http://api.formulasearchengine.com/v1/?doc
>> or also
>>
>> https://en.wikipedia.org/api/rest_v1/?doc#!/Math/post_media_math_check_type
>> (The wikipedia endpoint has been opened to the public just moments ago)
>> In the future, we are planning to provide additional semantics from there.
>> If you have additional questions, please contact me directly, since I'm
>> not a member on the list.
>> Moritz
>>
>> On Tue, Feb 2, 2016 at 8:53 AM, Lydia Pintscher
>> mailto:lydia.pintsc...@wikimedia.de>>
>> wrote:
>>
>> On Mon, Feb 1, 2016 at 8:44 PM Markus Krötzsch
>> > > wrote:
>>
>> On 01.02.2016 17:14, Lydia Pintscher wrote:
>>  > Hey folks :)
>>  >
>>  > I just sat down with Katie to plan the next important feature
>>  > deployments that are coming up this month. Here is the plan:
>>  > * new datatype for mathematical expressions: We'll get it live
>> on
>>  > test.wikidata.org 
>>  tomorrow and then bring it
>>  > to wikidata.org  
>> on the 9th
>>
>> Documentation? What will downstream users like us need to do to
>> support
>> this? How is this mapped to JSON? How is this mapped to RDF?
>>
>>
>> It is a string representing markup for the Math extension. You can
>> already test it here: http://wikidata.beta.wmflabs.org/wiki/Q117940.
>> See also https://en.wikipedia.org/wiki/Help:Displaying_a_formula.
>> Maybe Moritz wants to say  bit more as his students created the
>> datatype.
>>
>> Cheers
>> Lydia
>> --
>> Lydia Pintscher - http://about.me/lydia.pintscher
>> Product Manager for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de 
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.
>> V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts
>> Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig
>> anerkannt durch das Finanzamt für Körperschaften I Berlin,
>> Steuernummer 27/029/42207.
>>
>>
>>
>>
>> --
>> Moritz Schubotz
>> TU Berlin, Fakultät IV
>> DIMA - Sekr. EN7
>> Raum EN742
>> Einsteinufer 17
>> D-10587 Berlin
>> Germany
>>
>> Tel.: +49 30 314 22784
>> Mobil.: +49 1578 047 1397
>> Fax:  +49 30 314 21601
>> E-Mail: schub...@tu-berlin.de 
>> Skype: Schubi87
>> ICQ: 200302764
>> Msn: mor...@schubotz.de
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property value max lenght? (aka: why doesn't this InChI fit??)

2016-01-16 Thread John Erling Blad

Same problem with bignums.


On Fri, Jan 15, 2016 at 2:24 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> Hi Egon,
>
> On Tue, Dec 29, 2015 at 8:53 AM, Egon Willighagen
>  wrote:
> > Hi all,
> >
> > I just discovered that there seems to be a limit on the length of
> > property values... some properties for compounds, however, are longer,
> > the InChI being a good example... 400 chars is not enough for some
> > compounds in Wikipedia, like teixobactin (Q18720369)
> >
> > This length is not defined by the property definition itself (InChI
> > (P234)), so I am wondering if this max length is system wide, or if
> > there are options to vary it? A max length of 1024 is better, though
> > still would not allow InChIs values for all compounds...
> >
> > Looking forward to hearing from you, and a happy new year,
>
> There is currently a character limit in place for everything (labels,
> description, values, etc). The reason is that we need to make sure
> people don't start entering long text that in the end again isn't
> machine-readable. The property you mention is currently scheduled to
> be converted to the new identifier datatype. What we could consider is
> increasing the length for values allowed in this particular datatype.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Duplicates in Wikidata

2015-12-27 Thread John Erling Blad

There are also a lot of errors/duplicates in WorldCat.

On Sun, Dec 27, 2015 at 12:43 PM, Gerard Meijssen  wrote:

> Hoi,
> Probably :)
> Thanks,
>  Gerard
>
> On 27 December 2015 at 12:31, Federico Leva (Nemo) 
> wrote:
>
>> Is this something for a Wikidata game? :)
>>
>> Nemo
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in monolingual text

2015-12-20 Thread John Erling Blad

Kven is derived from Finnish, and Finish act as a macro language.
Where is the procedure documented?

On Sun, Dec 20, 2015 at 7:50 PM, Gerard Meijssen 
wrote:

> Hoi,
> There is a procedure for new languages in the Wikimedia Foundation. In
> principle languages like Kven that are recognised by the ISO-639-3 are
> valid for inclusion in Wikidata. The procedure is that the language
> committee is informed and has the ability to prevent it to happen when need
> be.
>
> One of the things that are essential is the autonym for the language.
>
> When Kven is added as Finnish, it is 100% incorrect.
> Thanks,
>   GerardM
>
> On 20 December 2015 at 19:43, John Erling Blad  wrote:
>
>> Just checked, and Kven [2] (fkv), Romanes (rom) [3] and Romani [4]
>> (rotipa, romani rakripa, scandoromani, rmu) is still not valid languages in
>> the monolingual text data type. Those are listed as endangered languages
>> where Norway has a special responsibility. That is "de nasjonale
>> minoritetsspråkene kvensk, romanes og romani"[1]
>>
>> The Kven language is official language in Porsanger municipality in
>> Norway,[5] but we can't add it properly in the item. It is now added as
>> "Finnish" (Finnic language), but this is not correct. The Kven language is
>> a Finnic language, but it is not the same.
>>
>> It is sort of embarrassing to explain time and again that we don't
>> support these languages. I have reported this bug before.[7] Protection of
>> minority languages is a human rights obligation, but we simply dismiss
>> that. I think we should take this serious and fix this now.
>>
>> [1] http://www.sprakradet.no/Spraka-vare/Minoritetssprak/
>> [2] https://en.wikipedia.org/wiki/Kven_language
>> [3] https://en.wikipedia.org/wiki/Romani_language
>> [4] https://en.wikipedia.org/wiki/Scandoromani_language
>> [5] https://en.wikipedia.org/wiki/Porsanger
>> [6] https://www.wikidata.org/wiki/Q483885#P1448
>> [7] https://phabricator.wikimedia.org/T74590
>> [8]http://www.un.org/apps/news/story.asp?NewsID=44352
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Languages in monolingual text

2015-12-20 Thread John Erling Blad

Just checked, and Kven [2] (fkv), Romanes (rom) [3] and Romani [4] (rotipa,
romani rakripa, scandoromani, rmu) is still not valid languages in the
monolingual text data type. Those are listed as endangered languages where
Norway has a special responsibility. That is "de nasjonale
minoritetsspråkene kvensk, romanes og romani"[1]

The Kven language is official language in Porsanger municipality in
Norway,[5] but we can't add it properly in the item. It is now added as
"Finnish" (Finnic language), but this is not correct. The Kven language is
a Finnic language, but it is not the same.

It is sort of embarrassing to explain time and again that we don't support
these languages. I have reported this bug before.[7] Protection of minority
languages is a human rights obligation, but we simply dismiss that. I think
we should take this serious and fix this now.

[1] http://www.sprakradet.no/Spraka-vare/Minoritetssprak/
[2] https://en.wikipedia.org/wiki/Kven_language
[3] https://en.wikipedia.org/wiki/Romani_language
[4] https://en.wikipedia.org/wiki/Scandoromani_language
[5] https://en.wikipedia.org/wiki/Porsanger
[6] https://www.wikidata.org/wiki/Q483885#P1448
[7] https://phabricator.wikimedia.org/T74590
[8]http://www.un.org/apps/news/story.asp?NewsID=44352
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Units

2015-12-20 Thread John Erling Blad

Ok, still not working properly.

On Sun, Dec 20, 2015 at 5:23 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Sun, Dec 20, 2015 at 5:08 PM, John Erling Blad 
> wrote:
> > Sorry for long rant!
> >
> > 1. How do I get/set the list of base units (try "foot" and then ask
> yourself
> > "which foot is this?" [4])
> > 2. How do I get/set derived units (Siemens is the inverse of Ohm, that is
> > S=Ω⁻¹ [3])
> > 3. How do I add prefixed units (1kΩ, and 1mΩ, and note there is a bunch
> of
> > non-standard prefixes - not to forget localized ones! [5] I hate the
> mess on
> > Wikipedia... And note the mess with kilogram[7])
> > 4. How to normalize a unit (it is (nearly) always µF, even when you write
> > 4700µF [6] - this text is so messy and it does not really address the
> > problem)
> > 5. Is there any plan to handle deprecated units (the weight prototype
> > gaining weight[1], and the new proposed standard [2] is one known problem
> > 6. How to disambiguate units (the feet-problem in another version)
> > 7. Is there any plan to add warnings about units that needs
> disambiguation
> > (the feet-problem is well-known, but how about kilogram? And note that is
> > the kilogram that is the standard unit, not the gram.)
> > 8. How to handle incompatibilities between unit systems (you can't
> convert
> > some old units to newer ones.)
>
> That's why I said a minimal version is live. In due time we'll get to
> these but they're not more important than the other things I
> mentioned.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] enabling in other projects sidebar on all wikis in January

2015-12-20 Thread John Erling Blad

+1, and let all the templates to mimic such linkage die in flames!

2015-12-20 12:41 GMT+01:00 Lydia Pintscher :

> Oh and one thing I forgot: A big thank you to Tpt who did most of the
> development for this feature.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Units

2015-12-20 Thread John Erling Blad

Sorry for long rant!

1. How do I get/set the list of base units (try "foot" and then ask
yourself "which foot is this?" [4])
2. How do I get/set derived units (Siemens is the inverse of Ohm, that is
S=Ω⁻¹ [3])
3. How do I add prefixed units (1kΩ, and 1mΩ, and note there is a bunch of
non-standard prefixes - not to forget localized ones! [5] I hate the mess
on Wikipedia... And note the mess with kilogram[7])
4. How to normalize a unit (it is (nearly) always µF, even when you write
4700µF [6] - this text is so messy and it does not really address the
problem)
5. Is there any plan to handle deprecated units (the weight prototype
gaining weight[1], and the new proposed standard [2] is one known problem
6. How to disambiguate units (the feet-problem in another version)
7. Is there any plan to add warnings about units that needs disambiguation
(the feet-problem is well-known, but how about kilogram? And note that is
the kilogram that is the standard unit, not the gram.)
8. How to handle incompatibilities between unit systems (you can't convert
some old units to newer ones.)

On 1, perhaps we could make Wd-entries for the different feets, but then
the lookup-list will be very long. Old classical units for length, area,
volume, and weight are the biggest problems. Some of them also coincide
with 8, as accurate conversions isn't possible.

On 2, some derived units can be transformed from one form into another.
Siemens is one of them. Others can be expressed in different ways, but all
variants is really just one and the same. We could use aliases for this, as
for example Farad (F) is s⁴A²⋅m⁻²kg⁻¹ and a bunch of other. An other
solution is to cluster descriptions, that makes a better solution for 1.

On 3, the simple solution is to add the SI-prefixes to everything. It
almost works, except we have units like kilogram (kg) which should retain
the "k". It will also create problems with kph and a bunch of other such
localized units.

On 4, don't confuse normalization of the unit with normalization of the
value. Normalization of the unit is highly domain specific.

On 5, note that there is a subtle difference here between an unit that goes
out of common use and a unit that is deprecated through law. Not sure if we
need to differentiate those, hope not!

On 6, I think foot is a good example on how long this list can be. Note
that in some countries different trade unions used different length of a
foot, and even some cities defined their own foot. I would like to define
my foot as the new standard unit.

On 7, note that the accuracy (error bounds) on the number should trigger a
need for disambiguation. Also note that precision imply a set level of
accuracy. Accuracy and precision is not the same, but precision can be used
as a proxy for accuracy.

On 8, there are several posts about this problem. Some claim you can avoid
the problem by setting the accuracy in the conversion sufficiently high. I
don't think that would be a valid solution. Perhaps we should have a
property for valid conversions, with constants for each one of them and
with proper error bounds. If a conversion isn't listed, then it isn't valid.

[1] http://www.livescience.com/26017-kilogram-gained-weight.html
[2]
http://www.dailymail.co.uk/sciencetech/article-3161130/Reinventing-kilogram-Official-unit-weight-measurement-new-accurate-definition-following-breakthrough.html
[3] https://en.wikipedia.org/wiki/Siemens_%28unit%29
[4] https://en.wikipedia.org/wiki/Foot_%28unit%29
[5] https://en.wikipedia.org/wiki/Decametre
[6] https://www.westfloridacomponents.com/blog/is-mf-mfd-the-same-as-uf/
[7] http://www.bipm.org/en/bipm/mass/ipk/

On Sun, Dec 20, 2015 at 11:57 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Sun, Dec 20, 2015 at 10:08 AM, John Erling Blad 
> wrote:
> > Can someone give an explanation why development of units are so
> difficult,
> > or what seems to be the problem? Is there anything other people can do?
> >
> > It seems to me like this has a serious feature creep...
> >
> > https://phabricator.wikimedia.org/T77977
>
> We have done the minimum version and deployed it. You're able to enter
> and retrieve information with quantities and units. Now that the
> minimum is in place other things got higher priority. That was/is
> mainly data quality, properly linking to other sources in out export
> formats and a UI cleanup including separating out identifiers. Those
> are still in progress. Once we've brought those further along we'll
> pick up the remaining work for units as well.
> The main thing that is left now is unit conversion for the query service.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wik

[Wikidata] Units

2015-12-20 Thread John Erling Blad

Can someone give an explanation why development of units are so difficult,
or what seems to be the problem? Is there anything other people can do?

It seems to me like this has a serious feature creep...

https://phabricator.wikimedia.org/T77977
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Photographers' Identities Catalog (& WikiData)

2015-12-15 Thread John Erling Blad

There are some pretty good methods for optimizing the match process, but I
have not seen any implementation for that against Wikidata items. Only
things I've seen are some opportunistic methods. Duck tests gone wrong, or
"Darn it was a platypus!"

On Mon, Dec 14, 2015 at 11:19 PM, André Costa 
wrote:

> I'm planning to bring a few of the datasets into mix'n'match (@Magnus this
> is the one I asked sbout on Twitter) in January but not all of them are
> suitable and I believe separating KulturNav into multiple datasets on
> mix'n'match maxes more sense and makes it more likely that they get matched.
>
> Some of the early adopters of KulturNav have been working with WMSE to
> facilitate bi-directional matching. This is done on a dataset-by-dataset
> level since different institutions are responsible for different datasets.
> My hope is that mix'n'match will help in this area as well, even as a tool
> for the institutions own staff who are often interested in matching entries
> to Wikipedia (which most of the time means wikidata).
>
> @John: There are processes for matching kulturnav identifiers to wikidata
> entities. Only afterwards are details imported. Mainly to source statements
> [1] and [2]. There is some (not so user friendly) stats at [3].
>
> Cheers,
> André
>
> [1]
> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/L_PBot_2
> [2]
> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/L_PBot_3
> [3] https://tools.wmflabs.org/lp-tools/misc/data/
> --
> André Costa
> GLAM developer
> Wikimedia Sverige
>
> Magnus Manske, 13/12/2015 11:24:
>
> >
> > Since no one mentioned it, there is a tool to do the matching to WD much
> > more efficiently:
> > https://tools.wmflabs.org/mix-n-match/
> 
>
> +1
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikimedia-l] Quality issues

2015-12-09 Thread John Erling Blad

Andreas Kolbe have one point,a reference to a Wikipedia article should
point to the correct article, and should preferably point to the revision
introducing the value. It should be pretty easy to do this for most of the
statements...

On Wed, Dec 9, 2015 at 11:35 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> P.S. Meanwhile, your efforts in other channels are already leading some
> people to vandalise Wikidata just to make a point [1].
>
> Markus
>
> [1]
> http://forums.theregister.co.uk/forum/1/2015/12/08/wikidata_special_report/
>
>
>
> On 09.12.2015 11:32, Markus Krötzsch wrote:
>
>> On 08.12.2015 00:02, Andreas Kolbe wrote:
>>
>>> Hi Markus,
>>>
>> ...
>>
>>>
>>>
>>> Apologies for the late reply.
>>>
>>> While you indicated that you had crossposted this reply to Wikimedia-l,
>>> it didn't turn up in my inbox. I only saw it today, after Atlasowa
>>> pointed it out on the Signpost op-ed's talk page.[1]
>>>
>>
>> Yes, we have too many communication channels. Let me only reply briefly
>> now, to the first point:
>>
>>  > This prompted me to reply. I wanted to write an email that merely
>>> says: > "Really? Where did you get this from?" (Google using Wikidata
>>> content)
>>>
>>> Multiple sources, including what appears to be your own research group's
>>> writing:[2]
>>>
>>
>> What this page suggested was that that Freebase being shutdown means
>> that Google will use Wikidata as a source. Note that the short intro
>> text on the page did not say anything else about the subject, so I am
>> surprised that this sufficed to convince you about the truth of that
>> claim (it seems that other things I write with more support don't have
>> this effect). Anyway, I am really sorry to hear that this
>> quickly-written intro on the web has misled you. When I wrote this after
>> Google had made their Freebase announcement last year, I really believed
>> that this was the obvious implication. However, I was jumping to
>> conclusions there without having first-hand evidence. I guess many
>> people did the same. I fixed the statement now.
>>
>> To be clear: I am not saying that Google is not using Wikidata. I just
>> don't know. However, if you make a little effort, there is a lot of
>> evidence that Google is not using Wikidata as a source, even when it
>> could. For example, population numbers are off, even in cases where they
>> refer to the same source and time, and Google also shows many statements
>> and sources that are not in Wikidata at all (and not even in Primary
>> Sources).
>>
>> I still don't see any problem if Google would be using Wikidata, but
>> that's another discussion.
>>
>> You mention "multiple sources".
>> {{Which}}?
>>
>> Markus
>>
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Photographers' Identities Catalog (& WikiData)

2015-12-09 Thread John Erling Blad

Forgot to mention; Anders Beer Wilse in Kulturnav
http://kulturnav.org/2b94216b-f2fc-46a3-b2ce-eeb93aa19185

On Wed, Dec 9, 2015 at 11:19 PM, John Erling Blad  wrote:

> I think the Norwegian lists are a subset of Preus Photo Museums list. It
> is now maintained partly by Nasjonalbiblioteket (the Norwegian one, not the
> Swedish one) and Norsk Lokalhistorisk Institutt. For examle; Anders Beer
> Wilse in nowiki,[1] at Lokalhistoriewiki,[2] and at Nasjonalbiblioteket.[3]
>
> Kulturnav is a kind of maintained ontology, where most of the work is done
> by local museums. The software for the site itself is made (in part) by a
> grant from Norsk Kulturråd.
>
> We should connect as much as possible of our resources to resources at
> Kulturnav, and not just copy data. That said, we don't have a very good
> model for hov to materialize data from external sites and make it available
> for our client sites, so our option is more or less just to copy. It is
> better to maintain data at one location.
>
> [1] https://no.wikipedia.org/wiki/Anders_Beer_Wilse
> [2] https://lokalhistoriewiki.no/index.php/Anders_Beer_Wilse
> [3] http://www.nb.no/nmff/fotograf.php?fotograf_id=3050
>
> On Wed, Dec 9, 2015 at 9:51 PM, André Costa 
> wrote:
>
>> Happy to be of use. There is also one for:
>> * Swedish photo studios [1]
>> * Norwegian photographers[2]
>> * Norwegian photo studios [3]
>> I'm less familiar with these though and don't have a timeline for
>> wikidata integration.
>>
>> Cheers,
>> André
>>
>> [1] http://kulturnav.org/deb494a0-5457-4e5f-ae9b-e1826e0de681
>> [2] http://kulturnav.org/508197af-6e36-4e4f-927c-79f8f63654b2
>> [3] http://kulturnav.org/7d2a01d1-724c-4ad2-a18c-e799880a0241
>> --
>> André Costa
>> GLAM developer
>> Wikimedia Sverige
>> On 9 Dec 2015 15:07, "David Lowe"  wrote:
>>
>>> Thanks, André! I don't know that I've found that before. Great to get
>>> country (or region) specific lists like this.
>>> D
>>>
>>> On Wednesday, December 9, 2015, André Costa 
>>> wrote:
>>>
>>>> In case you haven't come across it before
>>>> http://kulturnav.org/1f368832-7649-4386-97b6-ae40cce8752b is the entry
>>>> point to the Swedish database of (primarily early) photographers curated by
>>>> the Nordic Museum in Stockholm.
>>>>
>>>> It's not that well integrated into Wikidata yet but the plan is to fix
>>>> that during early 2016. That would also allow a variety of photographs on
>>>> Wikimedia Commons to be linked to these entries.
>>>>
>>>> Cheers,
>>>> André
>>>>
>>>> André Costa | GLAM developer, Wikimedia Sverige |
>>>> andre.co...@wikimedia.se | +46 (0)733-964574
>>>>
>>>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>>>> Läs mer på blimedlem.wikimedia.se
>>>>
>>>> On 9 December 2015 at 02:44, David Lowe  wrote:
>>>>
>>>>> Thanks, Tom.
>>>>> I'll have to look at this specific case when I'm back at work
>>>>> tomorrow, as it does seem you found something in error.
>>>>> As for my process: with WD, I queried out the label, description &
>>>>> country of citizenship, dob & dod of of everyone with occupation:
>>>>> photographer. After some cleaning, I can get the WD data formatted like my
>>>>> own (Name, Nationality, Dates). I can then do a simple match, where
>>>>> everything matches exactly. For the remainder, I then match names and
>>>>> dates- without Nationality, which is often very "soft" information. For
>>>>> those that pass a smell test (one is "English" the other is "British") I
>>>>> pass those along, too. For those with greater discrepancies, I look still
>>>>> closer. For those with still greater discrepancies, I manually,
>>>>> individually query my database for anyone with the same last name & same
>>>>> first initial to catch misspellings or different transliterations. I also
>>>>> occasionally put my entire database into open refine to catch instances
>>>>> where, for instance, a Chinese name has been given as FamilyName, 
>>>>> GivenName
>>>>> in one source, and GivenName, FamilyName in another.
>>>>> In short, this is scrupulously- and manually- checked data. I'm not
>>>>> savvy enough to let an algorithm make my mistakes

Re: [Wikidata] Photographers' Identities Catalog (& WikiData)

2015-12-09 Thread John Erling Blad

I think the Norwegian lists are a subset of Preus Photo Museums list. It is
now maintained partly by Nasjonalbiblioteket (the Norwegian one, not the
Swedish one) and Norsk Lokalhistorisk Institutt. For examle; Anders Beer
Wilse in nowiki,[1] at Lokalhistoriewiki,[2] and at Nasjonalbiblioteket.[3]

Kulturnav is a kind of maintained ontology, where most of the work is done
by local museums. The software for the site itself is made (in part) by a
grant from Norsk Kulturråd.

We should connect as much as possible of our resources to resources at
Kulturnav, and not just copy data. That said, we don't have a very good
model for hov to materialize data from external sites and make it available
for our client sites, so our option is more or less just to copy. It is
better to maintain data at one location.

[1] https://no.wikipedia.org/wiki/Anders_Beer_Wilse
[2] https://lokalhistoriewiki.no/index.php/Anders_Beer_Wilse
[3] http://www.nb.no/nmff/fotograf.php?fotograf_id=3050

On Wed, Dec 9, 2015 at 9:51 PM, André Costa 
wrote:

> Happy to be of use. There is also one for:
> * Swedish photo studios [1]
> * Norwegian photographers[2]
> * Norwegian photo studios [3]
> I'm less familiar with these though and don't have a timeline for wikidata
> integration.
>
> Cheers,
> André
>
> [1] http://kulturnav.org/deb494a0-5457-4e5f-ae9b-e1826e0de681
> [2] http://kulturnav.org/508197af-6e36-4e4f-927c-79f8f63654b2
> [3] http://kulturnav.org/7d2a01d1-724c-4ad2-a18c-e799880a0241
> --
> André Costa
> GLAM developer
> Wikimedia Sverige
> On 9 Dec 2015 15:07, "David Lowe"  wrote:
>
>> Thanks, André! I don't know that I've found that before. Great to get
>> country (or region) specific lists like this.
>> D
>>
>> On Wednesday, December 9, 2015, André Costa 
>> wrote:
>>
>>> In case you haven't come across it before
>>> http://kulturnav.org/1f368832-7649-4386-97b6-ae40cce8752b is the entry
>>> point to the Swedish database of (primarily early) photographers curated by
>>> the Nordic Museum in Stockholm.
>>>
>>> It's not that well integrated into Wikidata yet but the plan is to fix
>>> that during early 2016. That would also allow a variety of photographs on
>>> Wikimedia Commons to be linked to these entries.
>>>
>>> Cheers,
>>> André
>>>
>>> André Costa | GLAM developer, Wikimedia Sverige |
>>> andre.co...@wikimedia.se | +46 (0)733-964574
>>>
>>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>>> Läs mer på blimedlem.wikimedia.se
>>>
>>> On 9 December 2015 at 02:44, David Lowe  wrote:
>>>
 Thanks, Tom.
 I'll have to look at this specific case when I'm back at work tomorrow,
 as it does seem you found something in error.
 As for my process: with WD, I queried out the label, description &
 country of citizenship, dob & dod of of everyone with occupation:
 photographer. After some cleaning, I can get the WD data formatted like my
 own (Name, Nationality, Dates). I can then do a simple match, where
 everything matches exactly. For the remainder, I then match names and
 dates- without Nationality, which is often very "soft" information. For
 those that pass a smell test (one is "English" the other is "British") I
 pass those along, too. For those with greater discrepancies, I look still
 closer. For those with still greater discrepancies, I manually,
 individually query my database for anyone with the same last name & same
 first initial to catch misspellings or different transliterations. I also
 occasionally put my entire database into open refine to catch instances
 where, for instance, a Chinese name has been given as FamilyName, GivenName
 in one source, and GivenName, FamilyName in another.
 In short, this is scrupulously- and manually- checked data. I'm not
 savvy enough to let an algorithm make my mistakes for me! But let me know
 if this seems to be more than bad luck of the draw- finding the conflicting
 data you found.
 I have also to say, I may suppress the Niepce Museum collection, as
 it's from a really crappy list of photographers in their collection which I
 found many years ago, and can no longer find. I don't want to blame them
 for the discrepancy, but that might be the source. I don't know.
 As I start to query out places of birth & death from WD in the next
 days, I expect to find more discrepancies. (Just today, I found dozens of
 folks whom ULAN gendered one way, and WD another- but were undeniably the
 same photographer. )
 Thanks,
 David


 On Tuesday, December 8, 2015, Tom Morris  wrote:

> Can you explain what "indexing" means in this context?  Is there some
> type of matching process?  How are duplicates resolved, if at all? Was the
> Wikidata info extracted from a dump or one of the APIs?
>
> When I looked at the first person I picked at random, Pierre Berdoy
> (ID:269710), I see that both Wikidata and Wikipedia claim that he was born
> in B

Re: [Wikidata] [Wikimedia-l] Quality issues

2015-12-01 Thread John Erling Blad

Quality is a lot of possible metrics, and coherence between content and
sources is just one of them. If you expose your sources it is possible to
check if they are coherent with your own claim, and that is a very
effective measure to stop propagation of false claims. If you don't give
any sources you may propagate an error, possibly introduced by some evil
regime.

If you are afraid of false claims, start to give references on your content
both on Wikipedia, Wikidata, and whatever site your on.

On Wed, Dec 2, 2015 at 1:03 AM, John Erling Blad  wrote:

> I for one had some discussions with Denny about licensing, and even if it
> hurt my feelings to say this (at least two of them) he was right. Facts
> can't be copyrighted and because of that CC0 is the natural choice for data
> in the database.
>
> Still in Europe databases can be given a protection, and that can limit
> the access to the site. By using the CC0 license on the whole thing reuse
> are much easier.
>
> Database protection and copyright is different issues and should not be
> mixed.
>
> John
>
> On Wed, Dec 2, 2015 at 12:43 AM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> [I continue cross-posting for this reply, but it would make sense to
>> return the thread to the Wikidata list where it started, so as to avoid
>> partial discussions happening in many places.]
>>
>>
>> Andreas,
>>
>> On 27.11.2015 12:08, Andreas Kolbe wrote:
>>
>>> Gerard,
>>>
>>
>> (I should note that my reply has nothing to do with what Gerard said, or
>> to the high-level "quality" debate in this thread.)
>>
>> [...]
>>
>> Wikipedia content is considered a reliable source in Wikidata, and
>>> Wikidata content is used as a reliable source by Google, where it
>>> appears without any indication of its provenance.
>>>
>>
>> This prompted me to reply. I wanted to write an email that merely says:
>>
>> "Really? Where did you get this from?" (Google using Wikidata content)
>>
>> But then I read the rest ... so here you go ...
>>
>>
>> Your email mixes up many things and effects, some of which are important
>> issues (e.g., the fact that VIAF is not a primary data source that should
>> be used in citations). Many other of your remarks I find very hard to take
>> serious, including but not limited to the following:
>>
>> * A rather bizarre connection between licensing models and accountability
>> (as if it would make content more credible if you are legally required to
>> say that you found it on Wikipedia, or even give a list of user names and
>> IPs who contributed)
>> * Some stories that I think you really just made up for the sake of
>> argument (Denny alone has picked the Wikidata license? Google displays
>> Wikidata content? Bing is fuelled by Wikimedia?)
>> * Some disjointed remarks about the history of capitalism
>> * The assertion that content is worse just because the author who created
>> it used a bot for editing
>> * The idea that engineers want to build systems with bad data because
>> they like the challenge of cleaning it up -- I mean: really! There is
>> nothing one can even say to this.
>> * The complaint that Wikimedia employs too much engineering expertise and
>> too little content expertise (when, in reality, it is a key principle of
>> Wikimedia to keep out of content, and communities regularly complain WMF
>> would still meddle too much).
>> * All those convincing arguments you make against open, anonymous editing
>> because of it being easy to manipulate (I've heard this from Wikipedia
>> critics ten years ago; wonder what became of them)
>> * And, finally, the culminating conspiracy theory of total control over
>> political opinion, destroying all plurality by allowing only one viewpoint
>> (not exactly what I observe on the Web ...) -- and topping this by blaming
>> it all on the choice of a particular Creative Commons license for Wikidata!
>> Really, you can't make this up.
>>
>> Summing up: either this is an elaborate satire that tries to test how
>> serious an answer you will get on a Wikimedia list, or you should
>> *seriously* rethink what you wrote here, take back the things that are
>> obviously bogus, and have a down-to-earth discussion about the topics you
>> really care about (licenses and cyclic sourcing on Wikimedia projects, I
>> guess; "capitalist companies controlling public media" should be discussed
>> in another forum).
>>
>> Kind regards,
>>
>> Markus
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Wikimedia-l] Quality issues

2015-12-01 Thread John Erling Blad

I for one had some discussions with Denny about licensing, and even if it
hurt my feelings to say this (at least two of them) he was right. Facts
can't be copyrighted and because of that CC0 is the natural choice for data
in the database.

Still in Europe databases can be given a protection, and that can limit the
access to the site. By using the CC0 license on the whole thing reuse are
much easier.

Database protection and copyright is different issues and should not be
mixed.

John

On Wed, Dec 2, 2015 at 12:43 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> [I continue cross-posting for this reply, but it would make sense to
> return the thread to the Wikidata list where it started, so as to avoid
> partial discussions happening in many places.]
>
>
> Andreas,
>
> On 27.11.2015 12:08, Andreas Kolbe wrote:
>
>> Gerard,
>>
>
> (I should note that my reply has nothing to do with what Gerard said, or
> to the high-level "quality" debate in this thread.)
>
> [...]
>
> Wikipedia content is considered a reliable source in Wikidata, and
>> Wikidata content is used as a reliable source by Google, where it
>> appears without any indication of its provenance.
>>
>
> This prompted me to reply. I wanted to write an email that merely says:
>
> "Really? Where did you get this from?" (Google using Wikidata content)
>
> But then I read the rest ... so here you go ...
>
>
> Your email mixes up many things and effects, some of which are important
> issues (e.g., the fact that VIAF is not a primary data source that should
> be used in citations). Many other of your remarks I find very hard to take
> serious, including but not limited to the following:
>
> * A rather bizarre connection between licensing models and accountability
> (as if it would make content more credible if you are legally required to
> say that you found it on Wikipedia, or even give a list of user names and
> IPs who contributed)
> * Some stories that I think you really just made up for the sake of
> argument (Denny alone has picked the Wikidata license? Google displays
> Wikidata content? Bing is fuelled by Wikimedia?)
> * Some disjointed remarks about the history of capitalism
> * The assertion that content is worse just because the author who created
> it used a bot for editing
> * The idea that engineers want to build systems with bad data because they
> like the challenge of cleaning it up -- I mean: really! There is nothing
> one can even say to this.
> * The complaint that Wikimedia employs too much engineering expertise and
> too little content expertise (when, in reality, it is a key principle of
> Wikimedia to keep out of content, and communities regularly complain WMF
> would still meddle too much).
> * All those convincing arguments you make against open, anonymous editing
> because of it being easy to manipulate (I've heard this from Wikipedia
> critics ten years ago; wonder what became of them)
> * And, finally, the culminating conspiracy theory of total control over
> political opinion, destroying all plurality by allowing only one viewpoint
> (not exactly what I observe on the Web ...) -- and topping this by blaming
> it all on the choice of a particular Creative Commons license for Wikidata!
> Really, you can't make this up.
>
> Summing up: either this is an elaborate satire that tries to test how
> serious an answer you will get on a Wikimedia list, or you should
> *seriously* rethink what you wrote here, take back the things that are
> obviously bogus, and have a down-to-earth discussion about the topics you
> really care about (licenses and cyclic sourcing on Wikimedia projects, I
> guess; "capitalist companies controlling public media" should be discussed
> in another forum).
>
> Kind regards,
>
> Markus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] REST API for Wikidata

2015-11-30 Thread John Erling Blad

Seems like you filter siteids on language, I don't think this is a correct
behaviour.

On Mon, Nov 30, 2015 at 5:38 PM, John Erling Blad  wrote:

> If you are using the P/Q/whatever markers in the id, then you should not
> differentiate on items and properties in the root.
>
> The path /items/{item_id}/data/{property_label} should use the property id
> and not the property label. The later is not stable.
>
> On Mon, Nov 30, 2015 at 2:55 PM, Jeroen De Dauw 
> wrote:
>
>> Hey all,
>>
>> I've created a very rough REST API for Wikidata and am looking for your
>> feedback.
>>
>> * About this API: http://queryr.wmflabs.org
>> * Documentation: http://queryr.wmflabs.org/about/docs
>> * API root: http://queryr.wmflabs.org/api
>>
>> At present this is purely a demo. The data it serves is stale and
>> potentially incomplete, the endpoints and formats they use are very much
>> liable to change, the server setup is not reliable and I'm not 100% sure
>> I'll continue with this little project.
>>
>> The main thing I'm going for with this API compared to the existing one
>> is greater ease of use for common use cases. Several factors make this a
>> lot easier to do in a new API than in the existing one: no need the serve
>> all use cases, no need to retain compatibility with existing users and no
>> framework imposed restrictions. You can read more about the difference on
>> the website.
>>
>> You are invited to comment on the concept and on the open questions
>> mentioned on the website.
>>
>> Cheers
>>
>> --
>> Jeroen De Dauw - http://www.bn2vs.com
>> Software craftsmanship advocate
>> ~=[,,_,,]:3
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] REST API for Wikidata

2015-11-30 Thread John Erling Blad

If you are using the P/Q/whatever markers in the id, then you should not
differentiate on items and properties in the root.

The path /items/{item_id}/data/{property_label} should use the property id
and not the property label. The later is not stable.

On Mon, Nov 30, 2015 at 2:55 PM, Jeroen De Dauw 
wrote:

> Hey all,
>
> I've created a very rough REST API for Wikidata and am looking for your
> feedback.
>
> * About this API: http://queryr.wmflabs.org
> * Documentation: http://queryr.wmflabs.org/about/docs
> * API root: http://queryr.wmflabs.org/api
>
> At present this is purely a demo. The data it serves is stale and
> potentially incomplete, the endpoints and formats they use are very much
> liable to change, the server setup is not reliable and I'm not 100% sure
> I'll continue with this little project.
>
> The main thing I'm going for with this API compared to the existing one is
> greater ease of use for common use cases. Several factors make this a lot
> easier to do in a new API than in the existing one: no need the serve all
> use cases, no need to retain compatibility with existing users and no
> framework imposed restrictions. You can read more about the difference on
> the website.
>
> You are invited to comment on the concept and on the open questions
> mentioned on the website.
>
> Cheers
>
> --
> Jeroen De Dauw - http://www.bn2vs.com
> Software craftsmanship advocate
> ~=[,,_,,]:3
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Use-notes in item descriptions

2015-11-05 Thread John Erling Blad

Descriptions is a clarification like the parenthesis form on Wikipedia, but
extended and formalized. Use notes should not be put into this field.

John

On Thu, Nov 5, 2015 at 6:19 PM, James Heald  wrote:

> The place where these hints are vital is in the tool-tips that come up
> when somebody is inputting the value of a property.
>
> It's a quick message to say "don't use that item, use this other item".
>
> A section on the talk page simply doesn't cover it.
>
> I suppose one could create a community property, as you suggest, but as
> you say the challenge would be then making sure the system software
> presented it when it was needed.  I suspect that things intended to be
> presented by the system software are better created as system properties.
>
>-- James,
>
>
>
>
> On 05/11/2015 16:21, Benjamin Good wrote:
>
>> A section in the talk page associated with the article in question would
>> seem to solve this (definitely real) problem? - assuming that a would-be
>> editor was aware of the talk page.
>> Alternatively, you could propose a generic property with a text field that
>> could be added to items on an as-needed basis without any change to the
>> current software.  Again though, the challenge would be getting the
>> information in front of the user/editor at the right point in time.
>>
>>
>> On Thu, Nov 5, 2015 at 2:16 AM, Jane Darnell  wrote:
>>
>> Yes I have noticed this need for use notes, but it is specific to
>>> properties, isn't it? I see it in things such as choosing what to put in
>>> the "genre" property of an artwork. It would be nice to have some sort of
>>> pop-up that you can fill with more than what you put in. For example I
>>> get
>>> easily confused when I address the relative (as in kinship) properties;
>>> "father of the subject" is clear, but what about cousin/nephew etc.? You
>>> need more explanation room than can be stuffed in the label field to fit
>>> in
>>> the drop down. I have thought about this, but don't see any easy solution
>>> besides what you have done.
>>>
>>> On Thu, Nov 5, 2015 at 10:51 AM, James Heald  wrote:
>>>
>>> I have been wondering about the practice of putting use-notes in item
 descriptions.

 For example, on Q6581097 (male)
https://www.wikidata.org/wiki/Q6581097
 the (English) description reads:
"human who is male (use with Property:P21 sex or gender). For
 groups of males use with subclass of (P279)."

 I have added some myself recently, working on items in the
 administrative
 structure of the UK -- for example on Q23112 (Cambridgeshire)
 https://www.wikidata.org/wiki/Q23112
 I have changed the description to now read
 "ceremonial county of England (use Q21272276 for administrative
 non-metropolitan county)"

 These "use-notes" are similar to the disambiguating hat-notes often
 found
 at the top of articles on en-wiki and others; and just as those
 hat-notes
 can be useful on wikis, so such use-notes can be very useful on
 Wikidata,
 for example in the context of a search, or a drop-down menu.

 But...

 Given that the label field is also there to be presentable to end-users
 in contexts outside Wikidata, (eg to augment searches on main wikis, or
 to
 feed into the semantic web, to end up being used in who-knows-what
 different ways), yet away from Wikidata a string like "Q21272276" will
 typically have no meaning. Indeed there may not even be any distinct
 thing
 corresponding to it.  (Q21272276 has no separate en-wiki article, for
 example).

 So I'm wondering whether these rather Wikidata-specific use notes do
 really belong in the general description field ?

 Is there a case for moving them to a new separate use-note field created
 for them?

 The software could be adjusted to include such a field in search results
 and drop-downs and the item summary, but they would be a separate
 data-entry field on the item page, and a separate triple for the SPARQL
 service, leaving the description field clean of Wikidata-specific
 meaning,
 better for third-party and downstream applications.

 Am I right to feel that the present situation of just chucking
 everything
 into the description field doesn't seem quite right, and we ought to
 take a
 step forward from it?

-- James.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/lis

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad

Yes! +1

On Mon, Sep 28, 2015 at 11:27 PM, Denny Vrandečić 
wrote:

> Actually, my suggestion would be to switch on Primary Sources as a default
> tool for everyone. That should increase exposure and turnover, without
> compromising quality of data.
>
>
>
> On Mon, Sep 28, 2015 at 2:23 PM Denny Vrandečić 
> wrote:
>
>> Hi Gerard,
>>
>> given the statistics you cite from
>>
>> https://tools.wmflabs.org/wikidata-primary-sources/status.html
>>
>> I see that 19.6k statements have been approved through the tool, and 5.1k
>> statements have been rejected - which means that about 1 in 5 statements is
>> deemed unsuitable by the users of primary sources.
>>
>> Given that there are 12.4M statements in the tool, this means that about
>> 2.5M statements will turn out to be unsuitable for inclusion in Wikidata
>> (if the current ratio holds). Are you suggesting to upload all of these
>> statements to Wikidata?
>>
>> Tpt already did upload pieces of the data which have sufficient quality
>> outside the primary sources tool, and more is planned. But for the data
>> where the suitability for Wikidata seems questionable, I would not know
>> what other approach to use. Do you have a suggestion?
>>
>> Once you have a suggestion and there is community consensus in doing it,
>> no one will stand in the way of implementing that suggestion.
>>
>> Cheers,
>> Denny
>>
>>
>> On Mon, Sep 28, 2015 at 1:19 PM John Erling Blad 
>> wrote:
>>
>>> Another; make a kind of worklist on Wikidata that reflect the watchlist
>>> on the clients (Wikipedias) but then, we often have items on our watchlist
>>> that we don't know much about. (Digression: Somehow we should be able to
>>> sort out those things we know (the place we live, the persons we have meet)
>>> from those things we have done (edited, copy-pasted).)
>>>
>>> I been trying to get some interest in the past for worklists on
>>> Wikipedia, it isn't much interest to make them. It would speed up tedious
>>> tasks of finding the next page to edit after a given edit is completed. It
>>> is the same problem with imports from Freebase on Wikidata, locate the next
>>> item on Wikidata with the same queued statement from Freebase, but within
>>> some worklist that the user has some knowledge about.
>>>
>>> Imagine "municipalities within a county" or "municipalities that is also
>>> on the users watchlist", and combine that with available unhandled
>>> Freebase-statements.
>>>
>>> On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad 
>>> wrote:
>>>
>>>> Could it be possible to create some kind of info (notification?) in a
>>>> wikipedia article that additional data is available in a queue ("freebase")
>>>> somewhere?
>>>>
>>>> If you have the article on your watch-list, then you will get a warning
>>>> that says "You lazy boy, get your ass over here and help us out!" Or
>>>> perhaps slightly rephrased.
>>>>
>>>> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
>>>> mar...@semantic-mediawiki.org> wrote:
>>>>
>>>>> Hi Gerard, hi all,
>>>>>
>>>>> The key misunderstanding here is that the main issue with the Freebase
>>>>> import would be data quality. It is actually community support. The goal 
>>>>> of
>>>>> the current slow import process is for the Wikidata community to "adopt"
>>>>> the Freebase data. It's not about "storing" the data somewhere, but about
>>>>> finding a way to maintain it in the future.
>>>>>
>>>>> The import statistics show that Wikidata does not currently have
>>>>> enough community power for a quick import. This is regrettable, but not
>>>>> something that we can fix by dumping in more data that will then be
>>>>> orphaned.
>>>>>
>>>>> Freebase people: this is not a small amount of data for our young
>>>>> community. We really need your help to digest this huge amount of data! I
>>>>> am absolutely convinced from the emails I saw here that none of the former
>>>>> Freebase editors on this list would support low quality standards. They
>>>>> have fought hard to fix errors and avoid issues coming into their data for
>>>>> a long time.
>>>>>
>>>>> Nobody

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad

I would like to add old URLs that seems to be a source but does not
reference anything in the claim. For example in an item about a person, the
name or the birth date of the person does not appear on the page still the
page is used as a source for the persons birth date.


On Mon, Sep 28, 2015 at 11:44 PM, Stas Malyshev 
wrote:

> Hi!
>
> > I see that 19.6k statements have been approved through the tool, and
> > 5.1k statements have been rejected - which means that about 1 in 5
> > statements is deemed unsuitable by the users of primary sources.
>
> From my (limited) experience with Primary Sources, there are several
> kinds of things there that I had rejected:
>
> - Unsourced statements that contradict what is written in Wikidata
> - Duplicate claims already existing in Wikidata
> - Duplicate claims with worse data (i.e. less accurate location, less
> specific categorization, etc) or unnecessary qualifiers (such as adding
> information which is already contained in the item to item's qualifiers
> - e.g. zip code for a building)
> - Source references that do not exist (404, etc.)
> - Source references that do exist but either duplicate existing one (a
> number of sources just refer to different URL of the same data) or do
> not contain the information they should (e.g. link to newspaper's
> homepage instead of specific article)
> - Claims that are almost obviously invalid (e.g. "United Kingdom" as a
> genre of a play)
>
> I think at least some of these - esp. references that do not exist and
> duplicates with no refs - could be removed automatically, thus raising
> the relative quality of the remaining items.
>
> OTOH, some of the entries can be made self-evident - i.e. if we talk
> about movie and Freebase has IMDB ID or Netflix ID, it may be quite easy
> to check if that ID is valid and refers to a movie by the same name,
> which should be enough to merge it.
>
> Not sure if those one-off things worth bothering with, just putting it
> out there to consider.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] I know this is AbsolutlyWrong™

2015-09-28 Thread John Erling Blad

Just so there are no misunderstandings, I'm right when you are wrong!
That closes the 1% gap.

CaseClosed©

On Mon, Sep 28, 2015 at 11:49 PM, Daniel Kinzler <
daniel.kinz...@wikimedia.de> wrote:

> Awww, thanks John, I think I needed that :)
>
> Am 28.09.2015 um 22:29 schrieb John Erling Blad:
> > ...but don't focus to much on the 1% #¤%& wrong thing, focus on the 99%
> right thing.
> >
> > And I do think Wikibase is done 99% right!
> > (And the 1% WrongThing™ is just there so I can nag Danny and
> Duesentrieb...)
> >
> > John
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] I know this is AbsolutlyWrong™

2015-09-28 Thread John Erling Blad

...but don't focus to much on the 1% #¤%& wrong thing, focus on the 99%
right thing.

And I do think Wikibase is done 99% right!
(And the 1% WrongThing™ is just there so I can nag Danny and Duesentrieb...)

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad

Another; make a kind of worklist on Wikidata that reflect the watchlist on
the clients (Wikipedias) but then, we often have items on our watchlist
that we don't know much about. (Digression: Somehow we should be able to
sort out those things we know (the place we live, the persons we have meet)
from those things we have done (edited, copy-pasted).)

I been trying to get some interest in the past for worklists on Wikipedia,
it isn't much interest to make them. It would speed up tedious tasks of
finding the next page to edit after a given edit is completed. It is the
same problem with imports from Freebase on Wikidata, locate the next item
on Wikidata with the same queued statement from Freebase, but within some
worklist that the user has some knowledge about.

Imagine "municipalities within a county" or "municipalities that is also on
the users watchlist", and combine that with available unhandled
Freebase-statements.

On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad  wrote:

> Could it be possible to create some kind of info (notification?) in a
> wikipedia article that additional data is available in a queue ("freebase")
> somewhere?
>
> If you have the article on your watch-list, then you will get a warning
> that says "You lazy boy, get your ass over here and help us out!" Or
> perhaps slightly rephrased.
>
> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> Hi Gerard, hi all,
>>
>> The key misunderstanding here is that the main issue with the Freebase
>> import would be data quality. It is actually community support. The goal of
>> the current slow import process is for the Wikidata community to "adopt"
>> the Freebase data. It's not about "storing" the data somewhere, but about
>> finding a way to maintain it in the future.
>>
>> The import statistics show that Wikidata does not currently have enough
>> community power for a quick import. This is regrettable, but not something
>> that we can fix by dumping in more data that will then be orphaned.
>>
>> Freebase people: this is not a small amount of data for our young
>> community. We really need your help to digest this huge amount of data! I
>> am absolutely convinced from the emails I saw here that none of the former
>> Freebase editors on this list would support low quality standards. They
>> have fought hard to fix errors and avoid issues coming into their data for
>> a long time.
>>
>> Nobody believes that either Freebase or Wikidata can ever be free of
>> errors, and this is really not the point of this discussion at all [1]. The
>> experienced community managers among us know that it is not about the
>> amount of data you have. Data is cheap and easy to get, even free data with
>> very high quality. But the value proposition of Wikidata is not that it can
>> provide storage space for lot of data -- it is that we have a functioning
>> community that can maintain it. For the Freebase data donation, we do not
>> seem to have this community yet. We need to find a way to engage people to
>> do this. Ideas are welcome.
>>
>> What I can see from the statistics, however, is that some users (and I
>> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
>> a lot of effort into integrating the data already. This is great, and we
>> should thank these people because they are the ones who are now working on
>> what we are just talking about here. In addition, we should think about
>> ways of engaging more community in this. Some ideas:
>>
>> (1) Find a way to clean and import some statements using bots. Maybe
>> there are cases where Freebase already had a working import infrastructure
>> that could be migrated to Wikidata? This would also solve the community
>> support problem in one way. We just need to import the maintenance
>> infrastructure together with the data.
>>
>> (2) Find a way to expose specific suggestions to more people. The
>> Wikidata Games have attracted so many contributions. Could some of the
>> Freebase data be solved in this way, with a dedicated UI?
>>
>> (3) Organise Freebase edit-a-thons where people come together to work
>> through a bunch of suggested statements.
>>
>> (4) Form wiki projects that discuss a particular topic domain in Freebase
>> and how it could be imported faster using (1)-(3) or any other idea.
>>
>> (5) Connect to existing Wiki projects to make them aware of valuable data
>> they might take from Freebase.
>>
>> Freebase is a much better resource than many other data resources

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad

Could it be possible to create some kind of info (notification?) in a
wikipedia article that additional data is available in a queue ("freebase")
somewhere?

If you have the article on your watch-list, then you will get a warning
that says "You lazy boy, get your ass over here and help us out!" Or
perhaps slightly rephrased.

On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Hi Gerard, hi all,
>
> The key misunderstanding here is that the main issue with the Freebase
> import would be data quality. It is actually community support. The goal of
> the current slow import process is for the Wikidata community to "adopt"
> the Freebase data. It's not about "storing" the data somewhere, but about
> finding a way to maintain it in the future.
>
> The import statistics show that Wikidata does not currently have enough
> community power for a quick import. This is regrettable, but not something
> that we can fix by dumping in more data that will then be orphaned.
>
> Freebase people: this is not a small amount of data for our young
> community. We really need your help to digest this huge amount of data! I
> am absolutely convinced from the emails I saw here that none of the former
> Freebase editors on this list would support low quality standards. They
> have fought hard to fix errors and avoid issues coming into their data for
> a long time.
>
> Nobody believes that either Freebase or Wikidata can ever be free of
> errors, and this is really not the point of this discussion at all [1]. The
> experienced community managers among us know that it is not about the
> amount of data you have. Data is cheap and easy to get, even free data with
> very high quality. But the value proposition of Wikidata is not that it can
> provide storage space for lot of data -- it is that we have a functioning
> community that can maintain it. For the Freebase data donation, we do not
> seem to have this community yet. We need to find a way to engage people to
> do this. Ideas are welcome.
>
> What I can see from the statistics, however, is that some users (and I
> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
> a lot of effort into integrating the data already. This is great, and we
> should thank these people because they are the ones who are now working on
> what we are just talking about here. In addition, we should think about
> ways of engaging more community in this. Some ideas:
>
> (1) Find a way to clean and import some statements using bots. Maybe there
> are cases where Freebase already had a working import infrastructure that
> could be migrated to Wikidata? This would also solve the community support
> problem in one way. We just need to import the maintenance infrastructure
> together with the data.
>
> (2) Find a way to expose specific suggestions to more people. The Wikidata
> Games have attracted so many contributions. Could some of the Freebase data
> be solved in this way, with a dedicated UI?
>
> (3) Organise Freebase edit-a-thons where people come together to work
> through a bunch of suggested statements.
>
> (4) Form wiki projects that discuss a particular topic domain in Freebase
> and how it could be imported faster using (1)-(3) or any other idea.
>
> (5) Connect to existing Wiki projects to make them aware of valuable data
> they might take from Freebase.
>
> Freebase is a much better resource than many other data resources we are
> already using with similar approaches as (1)-(5) above, and yet it seems
> many people are waiting for Google alone to come up with a solution.
>
> Cheers,
>
> Markus
>
> [1] Gerard, if you think otherwise, please let us know which error rates
> you think are typical or acceptable for Freebase and Wikidata,
> respectively. Without giving actual numbers you just produce empty strawman
> arguments (for example: claiming that anyone would think that Wikidata is
> better quality than Freebase and then refuting this point, which nobody is
> trying to make). See https://en.wikipedia.org/wiki/Straw_man
>
>
> On 26.09.2015 18:31, Gerard Meijssen wrote:
>
>> Hoi,
>> When you analyse the statistics, it shows how bad the current state of
>> affairs is. Slightly over one in a thousanths of the content of the
>> primary sources tool has been included.
>>
>> Markus, Lydia and myself agree that the content of Freebase may be
>> improved. Where we differ is that the same can be said for Wikidata. It
>> is not much better and by including the data from Freebase we have a
>> much improved coverage of facts. The same can be said for the content of
>> DBpedia probably other sources as well.
>>
>> I seriously hate this procrastination and the denial of the efforts of
>> others. It is one type of discrimination that is utterly deplorable.
>>
>> We should concentrate on comparing Wikidata with other sources that are
>> maintained. We should do this repeatedly and concentrate on workflows
>> that seek the differences and provide wo

Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad

Probability of detection (PoD) is central to fighting vandalism, and that
does not imply making the vandalism less visible.

Symmetric statements makes vandalism appear in more places, making it more
visible, and thereby increasing the chance for detection.

If you isolate the vandalism it will be less visible, but then it will be
more likely that no one will ever spot it.

And yes, PoD is a military thingy and as such is disliked by the
wikicommunities. Still sometimes it is wise to check out what is actually
working and why it is working.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad

Depending on bots to set up symmetric relations is one of the things I find
very weird in Wikidata. That creates a situation where a user do an edit,
and a bot later on overrides the users previous edit. It is exactly the
same race condition that we fought earlier with the iw-bots, but now
replicated in Wikidata - a system that was supposed to remove the problem.

/me dumb, me confused.. o_O



On Mon, Sep 28, 2015 at 5:12 PM, Daniel Kinzler  wrote:

> Am 28.09.2015 um 16:43 schrieb Thomas Douillard:
> > Daniel Wrote:
> >> (*) This follows the principle of "magic is bad, let people edit".
> Allowing
> >> inconsistencies means we can detect errors by finding such
> inconsistencies.
> >> Automatically enforcing consistency may lead to errors propagating out
> of view
> >> of the curation process. The QA process on wikis is centered around
> edits, so
> >> every change should be an edit. Using a bot to fill in missing
> "reverse" links
> >> follows this idea. The fact that you found an issue with the data
> because you
> >> saw a bot do an edit is an example of this principle working nicely.
> >
> > That might prove to become a worser nightmare than the magic one ...
> It's seems
> > like refusing any kind of automation because it might surprise people
> for the
> > sake of exhausting them to let them do a lot of manual work.
>
> I'm not arguing against "any" kind of automation. I'm arguing against
> "invisible" automation baked into the backend software. We(*) very much
> encourage "visible" automation under community control like bots and other
> (semi-)automatic import tools like WiDaR.
>
> -- daniel
>
>
> (*) I'm part of the wikidata developer team, not an active member of the
> community. I'm primarily speaking for myself here, from my personal
> experience
> as a wikipedia and common admin. I know from past discussions that "bots
> over
> magic" is considered Best Practice among the dev team, and I believe it's
> also
> the approach preferred by the Wikidata community, but I cannot speak for
> them.
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad

I would like to add "minister", as there are some fine distinctions on who
is and who's not in a government. Still we call them all "ministers". Very
confusing, and very obvious at the same time.

There are also the differences in organisation of American municipalities,
oh what a glorious mess!

Then you have the differences between a state and its different
main/bi-lands, not to say the not inhabited parts.

It is a lot that don't have an obvious description.

And btw, twin cities, I found a lot of errors and pretended I didn't see
them. Don't tell anyone.

On Mon, Sep 28, 2015 at 4:04 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 28.09.2015 13:31, Luca Martinelli wrote:
>
>> 2015-09-28 11:16 GMT+02:00 Markus Krötzsch > >:
>>
>>> If this is the case, then maybe it
>>> should just be kept as an intentionally broad property that captures
>>> what we
>>> now find in the Wikipedias.
>>>
>>
>> +1, the more broad the application of certain property is, the better.
>> We really don't need to be 100% specific with a property, if we can
>> exploit qualifiers.
>>
>
> I would not completely agree to this: otherwise we could just have a
> property "related to" and use qualifiers for the rest ;-) It's always about
> finding the right balance for each case. Many properties (probably most)
> have a predominant natural definition that is quite clear. Take "parent" as
> a simple example of a property that can have a very strict definition
> (biological parent) and still be practically useful and easy to understand.
> The trouble is often with properties that have a legal/political meaning
> since they are different in each legislation (which in itself changes over
> space and time). "Twin city" is such a case; "mayor" is another; also
> classes like "company" are like this. I think we do well to stick to the
> "folk terminology" in such cases, which lacks precision but caters to our
> users.
>
> This can then be refined in the mid and long term (maybe using qualifiers,
> more properties, or new editing conventions). Each domain could have a
> dedicated Wikiproject to work this out (the Wikiproject Names is a great
> example of such an effort [1]).
>
> Markus
>
> [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Names
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Translation of ValueView

2015-09-14 Thread John Erling Blad

Okey, I'll tell the community to hang on for a while!
Do you have any date for the transfer, I think they are eager to start
translating.

John

On Mon, Sep 14, 2015 at 9:44 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Mon, Sep 14, 2015 at 9:37 PM, John Erling Blad 
> wrote:
> > Can someone point me to where ValueView is translated, thanks!
> > https://github.com/wmde/ValueView
>
> We are in the process of moving that repository. Once that is done it
> will be translated on translatewiki.
> https://phabricator.wikimedia.org/T112120
>
>
> Cheers
> Lydi
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Translation of ValueView

2015-09-14 Thread John Erling Blad

Can someone point me to where ValueView is translated, thanks!
https://github.com/wmde/ValueView
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Naming projects

2015-09-14 Thread John Erling Blad

There is no consensus at nowiki on use of this bot, it has not been raised,
and the name as such has neither been discussed either. My comment on the
name is solely my opinion. I'm not interested in whether it is a pun or
not, it has a clear and direct connection to a disease that has killed
people and I don't like it.

The bot has other issues, but that is another discussion.

On Mon, Sep 14, 2015 at 10:48 AM, Mathias Schindler <
mathias.schind...@gmail.com> wrote:

> On Mon, Sep 14, 2015 at 12:23 AM, John Erling Blad 
> wrote:
> > Please do not name projects "listeria", or use any other names of
> diseases
> > that has killed people. Thank you for taking this into consideration next
> > time.
>
> Hi John,
>
> given that the premise of your email is factually incorrect (as
> listeria is a pun referring to lists, sharing the name with a family
> of bacteria named after Joseph Lister (according to Wikipedia)), is
> there now consensus that the is no issue with this bot name and that
> the recommendation only applies to picking names in general without
> any assertion that this wasn't duly taken into consideration this
> time?
>
> Mathias
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Naming projects

2015-09-13 Thread John Erling Blad

Please do not name projects "listeria", or use any other names of diseases
that has killed people. Thank you for taking this into consideration next
time.

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] next 2 rounds of arbitrary access rollouts

2015-09-04 Thread John Erling Blad

Please tell us if there will be further delays! :)
John

On Wed, Sep 2, 2015 at 1:43 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Wed, Aug 19, 2015 at 3:15 PM, Lydia Pintscher
>  wrote:
> > Hi everyone,
> >
> > Update: we wanted to do the second batch last night but ran into some
> > issues we need to investigate more first before we can add another
> > huge wiki like enwp. Sorry for the delay. I'll keep you posted.
>
> We had to delay the next rollout. We now solved the issues and have
> set a new date. It'll be on September 16th. ENWP we're coming :)
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Data from Netherlands Statistics / CBS on Wikidata?

2015-08-08 Thread John Erling Blad

There have been some discussions about reuse of statistics with people
from Statistics Norway.[1] They use a format called JSON-stat.[2] A
bunch of census bureaus are starting to use JSON-stat, for example
Statistics Norway, UK’s Office for National Statistics, Statistics
Sweden, Statistics Denmark, Instituto Galego de Estatística, and
Central Statistics Office of Ireland. I've heard about other too.

I have started on some rant at Meta about it, I didn't finish it.[3]
Perhaps more people will join in? ;)

A central problem is that statistics are often produced as a
multidimensional dataset, where our topics are only single indices on
one of the dimensions. We can extract the relevant data, but it is
probably better to make a kind of composite key into the dataset to
identify the relevant stuff about our topic. That key can be stored as
a table-specific statement in Wikidata, and with a little bit of
planning it can be statistics-specific or even bureau-specific.

[1] https://ssb.no/en/
[2] http://json-stat.org/
[3] 
https://meta.wikimedia.org/wiki/Grants:IdeaLab/Import_and_visualize_census_data

On Fri, Aug 7, 2015 at 3:06 PM, Joe Filceolaire  wrote:
> Hypercubes and csv flat files belong in commons in my opinion  (commons may
> have a different opinion ). That's if we even want to store a copy.
>
> This source data should then be translated into wikidata triples and
> statements and imported into wikidata items.
>
> The statements in wikidata are then used to generate lists and tables and
> graphs and info graphics in wikipedia.
>
> At least that's how I see it
>
> Joe
>
>
> On Thu, 6 Aug 2015 17:00 Jane Darnell  wrote:
>>
>> I have used the CBS website to compile my own statistics for research.
>> Their data is completely available online as far as I know and you can
>> download the queries you run on the fly in .csv file format, or text or
>> excel. They have various data tables depending on what you find interesting
>> and complete tables of historical data is also available. That said, I think
>> any pilot project would need to start with their publications, which are
>> also available online. These can be freely used as sources for statements.
>> Interesting data for Wikidata could be population statistics of major cities
>> per century or employment statistics per city per century and so forth. See
>> CBS.nl
>>
>> On Thu, Aug 6, 2015 at 5:30 PM, Gerard Meijssen
>>  wrote:
>>>
>>> Hoi,
>>> As far as I am concerned, data that is available on the web is fine if
>>> you use data on the web. It makes no difference when the data is to be used
>>> in the context of the WMF.
>>>
>>> When the CBS shares data with us in Wikidata, it makes the data available
>>> in Wikipedia.
>>>
>>> It is why I would like for something small, a pilot project something
>>> where we can build on.
>>> Thanks,
>>>  GerardM
>>>
>>> On 6 August 2015 at 17:23, Thad Guidry  wrote:

 Netherlands Statistics should just post the data on the web...so that
 anyone can use its "Linked Data".

 And actually, CSV on the Web is now a reality (no longer a need for
 XBRL)

 https://lists.w3.org/Archives/Public/public-vocabs/2015Jul/0016.html

 As DanBri notes in his P.S. at the bottom of the above link..."... the
 ability (in the csv2rdf doc) to map from

 rows in a table via templates into RDF triplesis very powerful."

 Thad
 +ThadGuidry

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Maintenance scripts for clients

2015-08-06 Thread John Erling Blad

A couple of guys at nowiki has started using this tool, and if they
continue at present speed the list will be emptied in two weeks time.

Can you please add nnwiki too and I will inform the community there
that there is a tool available.

On Thu, Aug 6, 2015 at 10:55 AM, Magnus Manske
 wrote:
> John: List mode!
>
> https://tools.wmflabs.org/wikidata-todo/duplicity.php?wiki=nowiki&mode=list
>
> On Thu, Aug 6, 2015 at 8:16 AM Zolo ...  wrote:
>>
>> About missing labels: in frwiki, most Wikidata data are added using
>> Module:Wikidata. The module adds a generic "to be translated" category when
>> there is no French label. With Wikidata usage picking up speed, the
>> community is finally coming to grip with it, as can be seen from that stats
>> at Catégorie:Page utilisant des données de Wikidata à traduire.
>>
>> On Tue, Aug 4, 2015 at 7:06 PM, John Erling Blad  wrote:
>>>
>>> Nice solution, I'll post a link at Wikipedia:Torget.
>>> It is a bit like making a traffic statistic by using a road cam, so it
>>> wasn't really what I was looking for..
>>>
>>> On Tue, Aug 4, 2015 at 5:18 PM, Magnus Manske
>>>  wrote:
>>> > I set up one of my tools for you (nowiki) for [1] :
>>> > https://tools.wmflabs.org/wikidata-todo/duplicity.php
>>> >
>>> > It doesn't give you a list (though I could add that), rather presents
>>> > you
>>> > with a random one and tries to find a matching item. Basically, what
>>> > you
>>> > need to do anyway for due diligence.
>>> >
>>> >
>>> > Not quite sure what else you need, too much "somehow" in your
>>> > description...
>>> >
>>> >
>>> > On Tue, Aug 4, 2015 at 4:01 PM John Erling Blad 
>>> > wrote:
>>> >>
>>> >> We lack several maintenance scripts for the clients, that is human
>>> >> readable special pages with reports on which pages lacks special
>>> >> treatment. In no particular order we need some way to identify
>>> >> unconnected pages in general (the present one does not work [1]), we
>>> >> need some way to identify pages that are unconnected but has some
>>> >> language links, we need to identify items that are used in some
>>> >> language and lacks labels (almost like [2],but on the client and for
>>> >> items that are somehow connected to pages on the client), and we need
>>> >> to identify items that lacks specific claims and the client pages use
>>> >> a specific template.
>>> >>
>>> >> There are probably more such maintenance pages, these are those that
>>> >> are most urgent. Now users start to create categories to hack around
>>> >> the missing maintenance pages, which create a bunch of categories.[3]
>>> >> At Norwegian Bokmål there are just a few scripts that utilize data
>>> >> from Wikidata, still the number of categories starts to grow large.
>>> >>
>>> >> For us at the "receiving end" this is a show stopper. We can't
>>> >> convince the users that this is a positive addition to the pages
>>> >> without the maintenance scripts, because them we more or less are in
>>> >> the blind when we try to fix errors. We can't use random pages to try
>>> >> to prod the pages to find something that is wrong, we must be able to
>>> >> search for the errors and fix them.
>>> >>
>>> >> This summer we (nowiki) have added about ten (10) properties to the
>>> >> infobokses, some with scripts and some with the property parser
>>> >> function. Most of my time I have not been coding, and I have not been
>>> >> fixing errors. I have been trying to explain to the community why
>>> >> Wikidata is a good idea. At one point the changes was even reverted
>>> >> because someone disagree with what we had done. The whole thing
>>> >> basically revolves around "my article got an Q-id in the infobox and I
>>> >> don't know how to fix it". We know how to fix it, and I have explained
>>> >> that to the editors at nowiki several times. They still don't get it,
>>> >> so we need some way to fix it, and we don't have maintenance scripts
>>> >> to do it.
>>> >>
>>> >> Right now we don't need more wild

Re: [Wikidata] Maintenance scripts for clients

2015-08-04 Thread John Erling Blad

Nice solution, I'll post a link at Wikipedia:Torget.
It is a bit like making a traffic statistic by using a road cam, so it
wasn't really what I was looking for..

On Tue, Aug 4, 2015 at 5:18 PM, Magnus Manske
 wrote:
> I set up one of my tools for you (nowiki) for [1] :
> https://tools.wmflabs.org/wikidata-todo/duplicity.php
>
> It doesn't give you a list (though I could add that), rather presents you
> with a random one and tries to find a matching item. Basically, what you
> need to do anyway for due diligence.
>
>
> Not quite sure what else you need, too much "somehow" in your description...
>
>
> On Tue, Aug 4, 2015 at 4:01 PM John Erling Blad  wrote:
>>
>> We lack several maintenance scripts for the clients, that is human
>> readable special pages with reports on which pages lacks special
>> treatment. In no particular order we need some way to identify
>> unconnected pages in general (the present one does not work [1]), we
>> need some way to identify pages that are unconnected but has some
>> language links, we need to identify items that are used in some
>> language and lacks labels (almost like [2],but on the client and for
>> items that are somehow connected to pages on the client), and we need
>> to identify items that lacks specific claims and the client pages use
>> a specific template.
>>
>> There are probably more such maintenance pages, these are those that
>> are most urgent. Now users start to create categories to hack around
>> the missing maintenance pages, which create a bunch of categories.[3]
>> At Norwegian Bokmål there are just a few scripts that utilize data
>> from Wikidata, still the number of categories starts to grow large.
>>
>> For us at the "receiving end" this is a show stopper. We can't
>> convince the users that this is a positive addition to the pages
>> without the maintenance scripts, because them we more or less are in
>> the blind when we try to fix errors. We can't use random pages to try
>> to prod the pages to find something that is wrong, we must be able to
>> search for the errors and fix them.
>>
>> This summer we (nowiki) have added about ten (10) properties to the
>> infobokses, some with scripts and some with the property parser
>> function. Most of my time I have not been coding, and I have not been
>> fixing errors. I have been trying to explain to the community why
>> Wikidata is a good idea. At one point the changes was even reverted
>> because someone disagree with what we had done. The whole thing
>> basically revolves around "my article got an Q-id in the infobox and I
>> don't know how to fix it". We know how to fix it, and I have explained
>> that to the editors at nowiki several times. They still don't get it,
>> so we need some way to fix it, and we don't have maintenance scripts
>> to do it.
>>
>> Right now we don't need more wild ideas that will swamp the
>> development for months and years to come, we need maintenance scripts,
>> and we need them now!
>>
>> [1] https://no.wikipedia.org/wiki/Spesial:UnconnectedPages
>> [2] https://www.wikidata.org/wiki/Special:EntitiesWithoutLabel
>> [3]
>> https://no.wikipedia.org/wiki/Spesial:Prefiksindeks/Kategori:Artikler_hvor
>>
>> John Erling Blad
>> /jeblad
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Maintenance scripts for clients

2015-08-04 Thread John Erling Blad

We lack several maintenance scripts for the clients, that is human
readable special pages with reports on which pages lacks special
treatment. In no particular order we need some way to identify
unconnected pages in general (the present one does not work [1]), we
need some way to identify pages that are unconnected but has some
language links, we need to identify items that are used in some
language and lacks labels (almost like [2],but on the client and for
items that are somehow connected to pages on the client), and we need
to identify items that lacks specific claims and the client pages use
a specific template.

There are probably more such maintenance pages, these are those that
are most urgent. Now users start to create categories to hack around
the missing maintenance pages, which create a bunch of categories.[3]
At Norwegian Bokmål there are just a few scripts that utilize data
from Wikidata, still the number of categories starts to grow large.

For us at the "receiving end" this is a show stopper. We can't
convince the users that this is a positive addition to the pages
without the maintenance scripts, because them we more or less are in
the blind when we try to fix errors. We can't use random pages to try
to prod the pages to find something that is wrong, we must be able to
search for the errors and fix them.

This summer we (nowiki) have added about ten (10) properties to the
infobokses, some with scripts and some with the property parser
function. Most of my time I have not been coding, and I have not been
fixing errors. I have been trying to explain to the community why
Wikidata is a good idea. At one point the changes was even reverted
because someone disagree with what we had done. The whole thing
basically revolves around "my article got an Q-id in the infobox and I
don't know how to fix it". We know how to fix it, and I have explained
that to the editors at nowiki several times. They still don't get it,
so we need some way to fix it, and we don't have maintenance scripts
to do it.

Right now we don't need more wild ideas that will swamp the
development for months and years to come, we need maintenance scripts,
and we need them now!

[1] https://no.wikipedia.org/wiki/Spesial:UnconnectedPages
[2] https://www.wikidata.org/wiki/Special:EntitiesWithoutLabel
[3] https://no.wikipedia.org/wiki/Spesial:Prefiksindeks/Kategori:Artikler_hvor

John Erling Blad
/jeblad

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

Actually I think we have a problem with all three points in the CAP
theorem, so we should start coding for fault tolerance. Still this
goes off-topic. I'm done with this thread.

On Mon, Jul 13, 2015 at 6:29 PM, John Erling Blad  wrote:
> A property function for versioned labels would look _exactly_ the same
> as today, but the request for the actual value would use the label and
> a timestamp. The timestamp would be the last revision of the template.
>
> Look up books on distributed databases, and check out how they
> maintain consistency. In our case it is a simplified set up with a
> single master server with multiple clients that has read access to the
> server. In addition they have their own state that interfere with the
> master state.
>
> We have a consistency problem (CAP theorem) because the clients don't
> update their state (the templates) according to changes on the server
> (the labels). One solution to this is to keep the last known version
> (a timestamp) to make it possible to continue using outdated
> information.
>
> Another way to say this is that a change in the label leaves an
> invalid state at the clients, because the transaction ends prematurly.
> That is the templates are not updated, which they must be if the
> system lacks versioning.
>
> Even another way to describe this is that the process running on the
> server and the clients lacks isolation, which again can be restored
> with versioning.
>
> There are several consistency models that can be used, but I don't
> know if anyone includes something like the proposed "alias model".
>
> CAP theorem is described in these two, but I havn't read them, sorry for that!
>
> Brewer, Eric A.: Towards robust distributed systems (abstract), in:
> Proceedings of the nineteenth annual ACM symposium on Principles of
> distributed computing, PODC’00, ACM, New York, NY, USA, pp. 7–
>
> Gilbert, Seth and Lynch, Nancy: Brewer’s conjecture and the
> feasibility of consistent, available, partition-tolerant web services.
> SIGACT News (2002), vol. 33:pp. 51–59
>
> On Mon, Jul 13, 2015 at 4:21 PM, Markus Krötzsch
>  wrote:
>> On 13.07.2015 16:01, John Erling Blad wrote:
>>>
>>> No we should not make the aliases unique, the reason aliases are
>>> useful is because they are _not_ unique.
>>> Add versioning to labels, that is the only real solution.
>>
>>
>> Following this thread for a while, I still have no idea what this solution
>> is. Could you give an example of how the #property function in Wikitext will
>> look for this proposal?
>>
>>>
>>> There are books on the topic, and also some dr thesis. I don't think
>>> we should create anything ad hoc for this. Go for a proven solution.
>>
>>
>> Citation needed ;-)
>>
>> Markus
>>
>>
>>>
>>> On Mon, Jul 13, 2015 at 3:24 PM, Daniel Kinzler
>>>  wrote:
>>>>
>>>> Am 13.07.2015 um 13:00 schrieb Ricordisamoa:
>>>>>
>>>>> I agree too.
>>>>> Also note that property IDs are language-neutral, unlike english names
>>>>> of
>>>>> templates, magic words, etc.
>>>>
>>>>
>>>> As I said: if there is broad conseus to only use P-numbers to refer to
>>>> properties, fine with me (note however that Lydia disagrees, and it's her
>>>> decision). I like the idea of having the option of accessing properties
>>>> via
>>>> localized names, but if there is no demand for this possibility, and it's
>>>> a pain
>>>> to implement, I won't complain about dropping support for that.
>>>>
>>>> But *if* we allow access to properties via localized unique labels (as we
>>>> currently do), then we really *should* allow the same via unique aliases,
>>>> so
>>>> property labels can be chanegd without breaking stuff.
>>>>
>>>> --
>>>> Daniel Kinzler
>>>> Senior Software Developer
>>>>
>>>> Wikimedia Deutschland
>>>> Gesellschaft zur Förderung Freien Wissens e.V.
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

Another take on what happen with the labels.

You have a client that define some reference to data it gets from the
server. The server then changes the valid reference to the data, and
tells the client that the reference has changed. The client then says
"fine" but does not update the reference to the data. To solve this
(a) the master can provide an alternate lookup mechanism, or (b) the
client must update the references, or (c) the client must provide the
master with its old reference and a timestamp.

Wd-team wants to do (a) by making the aliases unique. That breaks some
other use, basically they won't be aliases anymore, and people here on
the list says "no".

I think nobody dare to do (b) because that imply rewriting the
templates and modules. (The references to the data is in the messy
templates, not some clean data structure.)

I want (c) but I think I'm perhaps the only one.

It is also possible to avoid the whole problem by using the P-ids, as
they wont change.

On Mon, Jul 13, 2015 at 6:29 PM, John Erling Blad  wrote:
> A property function for versioned labels would look _exactly_ the same
> as today, but the request for the actual value would use the label and
> a timestamp. The timestamp would be the last revision of the template.
>
> Look up books on distributed databases, and check out how they
> maintain consistency. In our case it is a simplified set up with a
> single master server with multiple clients that has read access to the
> server. In addition they have their own state that interfere with the
> master state.
>
> We have a consistency problem (CAP theorem) because the clients don't
> update their state (the templates) according to changes on the server
> (the labels). One solution to this is to keep the last known version
> (a timestamp) to make it possible to continue using outdated
> information.
>
> Another way to say this is that a change in the label leaves an
> invalid state at the clients, because the transaction ends prematurly.
> That is the templates are not updated, which they must be if the
> system lacks versioning.
>
> Even another way to describe this is that the process running on the
> server and the clients lacks isolation, which again can be restored
> with versioning.
>
> There are several consistency models that can be used, but I don't
> know if anyone includes something like the proposed "alias model".
>
> CAP theorem is described in these two, but I havn't read them, sorry for that!
>
> Brewer, Eric A.: Towards robust distributed systems (abstract), in:
> Proceedings of the nineteenth annual ACM symposium on Principles of
> distributed computing, PODC’00, ACM, New York, NY, USA, pp. 7–
>
> Gilbert, Seth and Lynch, Nancy: Brewer’s conjecture and the
> feasibility of consistent, available, partition-tolerant web services.
> SIGACT News (2002), vol. 33:pp. 51–59
>
> On Mon, Jul 13, 2015 at 4:21 PM, Markus Krötzsch
>  wrote:
>> On 13.07.2015 16:01, John Erling Blad wrote:
>>>
>>> No we should not make the aliases unique, the reason aliases are
>>> useful is because they are _not_ unique.
>>> Add versioning to labels, that is the only real solution.
>>
>>
>> Following this thread for a while, I still have no idea what this solution
>> is. Could you give an example of how the #property function in Wikitext will
>> look for this proposal?
>>
>>>
>>> There are books on the topic, and also some dr thesis. I don't think
>>> we should create anything ad hoc for this. Go for a proven solution.
>>
>>
>> Citation needed ;-)
>>
>> Markus
>>
>>
>>>
>>> On Mon, Jul 13, 2015 at 3:24 PM, Daniel Kinzler
>>>  wrote:
>>>>
>>>> Am 13.07.2015 um 13:00 schrieb Ricordisamoa:
>>>>>
>>>>> I agree too.
>>>>> Also note that property IDs are language-neutral, unlike english names
>>>>> of
>>>>> templates, magic words, etc.
>>>>
>>>>
>>>> As I said: if there is broad conseus to only use P-numbers to refer to
>>>> properties, fine with me (note however that Lydia disagrees, and it's her
>>>> decision). I like the idea of having the option of accessing properties
>>>> via
>>>> localized names, but if there is no demand for this possibility, and it's
>>>> a pain
>>>> to implement, I won't complain about dropping support for that.
>>>>
>>>> But *if* we allow access to properties via localized unique labels (as we
>>>> currently do), then we really *s

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

If you rename a template it still have the same history. If you delete
a template, then I don't see how you would have a problem with content
generated by that template. If someone oversights a revision, then the
template is effectively edited and a new timestamp is given to the
template. Whether two or more revisions have the same timestamp does
not matter, whats important is that the master does not have
conflicting labels on the same timestamp.

This is not about browsing the client on a past date, this is about
"browsing" the labels on a past timestamp - and hopefully that should
boil down to time resolution on the master(s), possibly with some time
skew between the different database servers.

On Mon, Jul 13, 2015 at 6:51 PM, Daniel Kinzler
 wrote:
> Am 13.07.2015 um 18:34 schrieb John Erling Blad:
>> You have versioning for templates, it is the last timestamp your
>> labels should refer to. You don't have to regenerate a previous
>> template, you just have to figure out which labels were valid at the
>> time the template was last saved. That timestamp is one additional
>> column in your labels table. That is your time warp machine. You don't
>> need a time warp machine for everything, to use your example.
>
> Works find until somebody renames or deletes a template, or oversights a
> revision, or there are multiple revisions with the same timestampt (yes, that 
> is
> possible), etc. This has been tried, and it works ok-ish for the "normal" 
> cases,
> and completely fails for edge cases, as far as I know:
> https://www.mediawiki.org/wiki/Extension:Memento
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

You have versioning for templates, it is the last timestamp your
labels should refer to. You don't have to regenerate a previous
template, you just have to figure out which labels were valid at the
time the template was last saved. That timestamp is one additional
column in your labels table. That is your time warp machine. You don't
need a time warp machine for everything, to use your example.

On Mon, Jul 13, 2015 at 4:43 PM, Daniel Kinzler
 wrote:
> Am 13.07.2015 um 16:01 schrieb John Erling Blad:
>> No we should not make the aliases unique, the reason aliases are
>> useful is because they are _not_ unique.
>> Add versioning to labels, that is the only real solution.
>
> We can do this once we have a mechanism in mediawiki that allows us to do this
> for templates ,etc. It's an extremely difficulat problem. So far nobody has 
> been
> able to implement it, though it has been on the wishlist for a really long 
> time.
>
> A "timewarp" feature for everythign would be really cool, but it's FAR harder 
> to
> implement. It would requrie a rewrite of quite a bit of MediaWiki.
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

A property function for versioned labels would look _exactly_ the same
as today, but the request for the actual value would use the label and
a timestamp. The timestamp would be the last revision of the template.

Look up books on distributed databases, and check out how they
maintain consistency. In our case it is a simplified set up with a
single master server with multiple clients that has read access to the
server. In addition they have their own state that interfere with the
master state.

We have a consistency problem (CAP theorem) because the clients don't
update their state (the templates) according to changes on the server
(the labels). One solution to this is to keep the last known version
(a timestamp) to make it possible to continue using outdated
information.

Another way to say this is that a change in the label leaves an
invalid state at the clients, because the transaction ends prematurly.
That is the templates are not updated, which they must be if the
system lacks versioning.

Even another way to describe this is that the process running on the
server and the clients lacks isolation, which again can be restored
with versioning.

There are several consistency models that can be used, but I don't
know if anyone includes something like the proposed "alias model".

CAP theorem is described in these two, but I havn't read them, sorry for that!

Brewer, Eric A.: Towards robust distributed systems (abstract), in:
Proceedings of the nineteenth annual ACM symposium on Principles of
distributed computing, PODC’00, ACM, New York, NY, USA, pp. 7–

Gilbert, Seth and Lynch, Nancy: Brewer’s conjecture and the
feasibility of consistent, available, partition-tolerant web services.
SIGACT News (2002), vol. 33:pp. 51–59

On Mon, Jul 13, 2015 at 4:21 PM, Markus Krötzsch
 wrote:
> On 13.07.2015 16:01, John Erling Blad wrote:
>>
>> No we should not make the aliases unique, the reason aliases are
>> useful is because they are _not_ unique.
>> Add versioning to labels, that is the only real solution.
>
>
> Following this thread for a while, I still have no idea what this solution
> is. Could you give an example of how the #property function in Wikitext will
> look for this proposal?
>
>>
>> There are books on the topic, and also some dr thesis. I don't think
>> we should create anything ad hoc for this. Go for a proven solution.
>
>
> Citation needed ;-)
>
> Markus
>
>
>>
>> On Mon, Jul 13, 2015 at 3:24 PM, Daniel Kinzler
>>  wrote:
>>>
>>> Am 13.07.2015 um 13:00 schrieb Ricordisamoa:
>>>>
>>>> I agree too.
>>>> Also note that property IDs are language-neutral, unlike english names
>>>> of
>>>> templates, magic words, etc.
>>>
>>>
>>> As I said: if there is broad conseus to only use P-numbers to refer to
>>> properties, fine with me (note however that Lydia disagrees, and it's her
>>> decision). I like the idea of having the option of accessing properties
>>> via
>>> localized names, but if there is no demand for this possibility, and it's
>>> a pain
>>> to implement, I won't complain about dropping support for that.
>>>
>>> But *if* we allow access to properties via localized unique labels (as we
>>> currently do), then we really *should* allow the same via unique aliases,
>>> so
>>> property labels can be chanegd without breaking stuff.
>>>
>>> --
>>> Daniel Kinzler
>>> Senior Software Developer
>>>
>>> Wikimedia Deutschland
>>> Gesellschaft zur Förderung Freien Wissens e.V.
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

No we should not make the aliases unique, the reason aliases are
useful is because they are _not_ unique.
Add versioning to labels, that is the only real solution.

There are books on the topic, and also some dr thesis. I don't think
we should create anything ad hoc for this. Go for a proven solution.

On Mon, Jul 13, 2015 at 3:24 PM, Daniel Kinzler
 wrote:
> Am 13.07.2015 um 13:00 schrieb Ricordisamoa:
>> I agree too.
>> Also note that property IDs are language-neutral, unlike english names of
>> templates, magic words, etc.
>
> As I said: if there is broad conseus to only use P-numbers to refer to
> properties, fine with me (note however that Lydia disagrees, and it's her
> decision). I like the idea of having the option of accessing properties via
> localized names, but if there is no demand for this possibility, and it's a 
> pain
> to implement, I won't complain about dropping support for that.
>
> But *if* we allow access to properties via localized unique labels (as we
> currently do), then we really *should* allow the same via unique aliases, so
> property labels can be chanegd without breaking stuff.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad

I for one agree with Gerard that this is a problem.

John

søn. 12. jul. 2015, 18.08 skrev Daniel Kinzler :

> Am 12.07.2015 um 15:31 schrieb Gerard Meijssen:
> > Hoi,
> > You do not get it.
>
> Indeed. This is why I am asking questions.
>
> > There are many properties. Consequently the scale of things
> > is substantially different.
>
> There are far, far more templates than properties. And we use unique,
> localized
> names for templates. Why not for properties? And if we don't want this for
> properties, why do the same arguments not apply for template names?
>
> > It has been demonstrated that languages will have
> > homonyms and consequently it is NOT a good idea to use labels or
> whatever you
> > call them for properties. You can use them as long as internally you use
> the
> > P-number.
>
> Internally, we always use the P-number. Unless with "internally" you mean
> "in
> wikitext". This is the point under discussion: whether we want localized
> names
> for use in wikitext.
>
> > You can use a text as long as the combination of label and description
> > is unique. This combination may be useful.
>
> This is how we do it for items. This works quite well with a selector
> widget. It
> does not work inside wikitext - there, you either need a unique name, or
> rely on
> the plain ID.
>
> For items, sitelinks act as a per-language unique name. For properties, we
> decided to require a unique label, since we can't use sitelinks there, and
> the
> number is low enough (a few thousand, compared to tens of millions of
> items)
> that ambuguities should be rare.
>
> > At the same time be aware that property labels will be wrong and will
> need to be
> > changed at a later date.
>
> This is why we want to make aliases unique. If we have unique aliases,
> labels
> can change without breaking anything.
>
> > When this presents a problem for the comparison with
> > external sources, it is tough. It is best to indicate this from the
> start.
>
> Why would labels or aliases be used for comparison with external sources?
> Properties can be linked to external vocabularies via statements, just
> like we
> do it for items. Relying on labels for doing this would be asking for
> trouble.
>
> > The argument about what happens in MediaWiki is secondary. And sorry
> that not
> > everyone cares or knows about that in your way. The point is very much
> that at
> > the scale of thousands and thousands of properties it does not scale.
> This point
> > has been made plenty of times by now.
>
> Really? How and where? I only hear you asserting it, but I see no
> evidence. I
> see it scaling perfectly well on Wikidata. Property names already *are*
> unique,
> always have been. I know of no major problems with this. There are some
> issues
> with cultural differences and homonyms (e.g. the distinction between sex
> and
> gender, or the double meaning of "editor" in Portuguese), but these are
> relatively rare, and no worse than naming dicussions on Wikipedia.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-11 Thread John Erling Blad

Forget aliases, it will only add to the overall mess.
Labels should be versioned.

John

On Sat, Jul 11, 2015 at 10:20 PM, Daniel Kinzler
 wrote:
> Am 11.07.2015 um 09:28 schrieb Gerard Meijssen:
>> Hoi,
>> I blogged about it. My argument is that Wikipedia should not be fenced in by
>> assumptions from Wikidata.
>
> Your blog post seemsw to assume that the wikidata label will be displayed on
> wikipedia. That's not the case. We are discussing the use of localized 
> property
> names (let's just stop calling them labels, it's misleading) for properties in
> the {{#property}} parser function, in order to retrieve the value. Only the
> value. So, is an ID better than a localized unique name? Both will only be
> visible in wikitext, and, in practice, only in wikitext in templates.
>
> Your blog post states that labels cannot be changed once they are set. This is
> wrong. The can be changed. Currently, that will however break all wikitext
> (templates) that use that name to refer to the property. This is what we are
> trying to fix by allowing properties to be accessed by an alias. The downside 
> is
> that this requires aliases to be unique.
>
> I'm not sure how the number of languages is relevant at all. The name(s) of a
> property have to be unique per language. How many languages there are doesn't
> matter at all for this, since there can not be conflicts between languages.
>
> In any case, *non* of this is at all visible to readers.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-09 Thread John Erling Blad

This can also be described (and fixed) as a light-weight version
problem. The clients templates and modules (probably also articles
later on) are using an old surface form of the property label that
existed at some previous time (the time when  was last saved). When
the client then tries to fetch the property value through use of the
label it fails because the property label has a new version.

The proper solution would then be to allow lookup of property labels
with a time constraint. Use of the time constraint would be hidden
from the user, it would be added by the software. If a property label
is used in a template/module, and the label has been changed at
Wikidata between to revisions, then a warning can be given in the
template/module's history. The new template/module will always use the
last timestamp from the revision, it would be up to the user to fix
those cases. If a template/module is fixed in the client then it will
work even if the label is changed on the repo.

Only real cost would be an additional column for timestamps in the
database table. A variation would be to make the lookup indirect
through the revision table. Not sure if that would add anything
useful.

On Thu, Jul 9, 2015 at 2:22 PM, Thomas Douillard
 wrote:
> Substitution is a standard mechanism in MediaWiki and would achive what
> Gerard needs ... for example:
> "{{subst:property_prime|date of birth}}" could be expanded and substituted
> to "{{#property:Pxxx}} <--date of birth-->"  and everything would be stored
> in the Wikitext. What's wrong with this kind of solution exactly ? you did
> not elaborate on that.
>
> Before I reply to what you wrote Gerard, let me summarize the question we
> actually need to answer to go forward:
>
> * currently, property values can be accessed from wikitext using the
> property's
> (unique) label. Do we want to keep this? (the alternative is access via
> P-ids only)
>
> * if yes, should it be possible to change a property's label at all?
>
> * if yes, should references to the old label break, or should they continue
> to work?
>
> * if they should continue to work, should this be achieved by making the old
> label an alias?
>
> * if no, how should it be achieved, exactly?
>
>
>
> Am 09.07.2015 um 11:54 schrieb Gerard Meijssen:
>> Hoi,
>> The parser would understand it because it stored  information. The
>> property is
>> still the same property, the label it uses is now seen as a local
>> overrride.
>
> That would be a completely new system, and quite complicated. It would also
> introduce a host of new issues (such local overrides may conflict with new
> names, or other local overrides, for instance. Language fallback makes this
> even
> more fun. Not to mention that we currently don't have a good place to store
> this
> kind of information).
>
> The current proposal is to store those overrides in wikidata, as aliases.
>
>> Daniel, there are many ways to solve this. The problem you face is based
>> on a
>> misconception. Language are not meant for rigidity.
>
> Indeed. But names can be chosen to be unique. We do that all the time when
> naming pages. And when naming properties. Property names (labels) were
> always
> meant to be unique, this is nothing new. (For a while, there was a bug that
> allowed duplicate labels under *some* circumstances, sorry for that).
>
>> Expectting that you can has
>> already been shown to be problematic. Consequently persisting on labels to
>> be
>> always unique is a problem of your own choosing. A problem that will not
>> go away
>> and is easiest solved now.
>
> If we drop the requirement that properties should be accessible from
> wikitext
> via their name, then yes, that would be easy. If people can live with using
> P-Ids directly, that's fine with me.
>
>> It is abundantly clear that you WILL use the requirement of Wikidata as an
>> excuse when a language has no alternative.
>
> Excuse for what? From a programming perspective, making people use IDs is by
> far
> the simplest solution. It's easy enough for remove support for label based
> access to properties, if that support is not needed.
>
> Allowing access from wikitext using non-unique names, THAT is not something
> I
> would want to support. I can't imagine how that would work at all.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread John Erling Blad

Sorry, but aliases are not deferred labels.
We use an alias to disambiguate between properties.

I don't think we have anything that is equivalent to DCterms coverage,...
At least I hope not!

An alias does not imply equivalence, not by a long shot.


On Wed, Jul 8, 2015 at 5:45 PM, Daniel Kinzler
 wrote:
> Am 08.07.2015 um 17:34 schrieb John Erling Blad:
>> You asked for an example, end those are valid examples. It is even an
>> example that use one of the most used ontologies on the net. Other
>> examples from DCterms is coverage, which can be both temporal and
>> spatial. We have a bunch of properties that can have an alias "DCterms
>> coverage", a country for example or a year.
>
> For cross-linking properties with other vocabularies, we use P1628 "Equivalent
> Property", not aliases. I don't see how an alias would be useful for that. 
> P1628
> allows you to specify URIs, and it is itself marked as equivalent to
> owl:equivalentProperty, so it can be used directly by reasoners.
>
>> Use a separate list of "deferred labels", and put the existing label
>> on that if someone tries to edit the defined (preferred) label. That
>> list should be unique, as it should not be possible to save a new
>> label that already exist on the list of deferred labels. At some
>> future point in time it can be implemented some clean up routine, but
>> I think it will take a long time before name clashes will be a real
>> problem.
>
> That's the idea, yes, we just call the deferred labels "aliases".
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread John Erling Blad

You asked for an example, end those are valid examples. It is even an
example that use one of the most used ontologies on the net. Other
examples from DCterms is coverage, which can be both temporal and
spatial. We have a bunch of properties that can have an alias "DCterms
coverage", a country for example or a year.

Use a separate list of "deferred labels", and put the existing label
on that if someone tries to edit the defined (preferred) label. That
list should be unique, as it should not be possible to save a new
label that already exist on the list of deferred labels. At some
future point in time it can be implemented some clean up routine, but
I think it will take a long time before name clashes will be a real
problem.

At some point I think we should seriously consider to use "SKOS Simple
Knowledge Organization System
Reference" http://www.w3.org/TR/skos-reference/

On Wed, Jul 8, 2015 at 4:21 PM, Daniel Kinzler
 wrote:
> Am 08.07.2015 um 14:13 schrieb John Erling Blad:
>> What you want is closer to a redirect than an alias, while an alias is
>> closer to a disambiguation page.
>
> Yes. The semantics of labels on properties is indeed different from labels on
> items, and always has been. Property labels are defined to be unique names.
> Extending the uniqueness to aliases allows them, to qact as redirects, which
> allow use to "rename" or "move" properties (that is, change their label).
>
> Using (unique) aliases for this seems the simplest solution. Introducing 
> another
> kind-of-aliases would be confusing in the UI as well as in the data model, and
> wopuld require a lot more code, which is nearly exactly the same as for 
> aliases.
>
> I don't follow your example with the DC vocabulary. For the height, width and
> length properties, why would one want an alias that is the same for all of 
> them?
> What would that be useful for?
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

1 2 >

1 - 100 of 115 matches

Mail list logo