Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Amir E. Aharoni
2009/1/6 Lars Aronsson :
> (Lars' interesting insights about Interwiki conflicts...)
> ...

Have you seen:

* http://meta.wikimedia.org/wiki/A_newer_look_at_the_interlanguage_link
* http://meta.wikimedia.org/wiki/Interwiki_synchronization

The last one is my own creation, which has surprisingly caught on;
Unfortunately i haven't had much time to maintain it lately, and i'd
be very glad if someone could help with that.

-- 
Amir Elisha Aharoni

heb: http://haharoni.wordpress.com | eng: http://aharoni.wordpress.com
cat: http://aprenent.wordpress.com | rus: http://amire80.livejournal.com

"We're living in pieces,
 I want to live in peace." - T. Moore

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Marco Schuster
On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson  wrote:
> In the longer term, we need to redesign the interwiki links into a
> centralized system, that can be maintained.  I think the way to do
> this is to use Wikimedia Commons.  Instead of copying all the
> interwiki links to every language of Wikipedia, it should be
> enough to add {{commons|Category:Writers from Austria}}, and the
> rest should happen automatically.

Commons has enough to do with keeping metafiles up to date, they'd be
crashed by also having to maintain IW links. I'd propose a new wiki,
to which editors have to apply to get write access so that vandalism
in this critical part is prevented.
For the meanwhile, theoretically one could launch a huge query on
toolserver which scans for conflicts. I'm not sure if this would be
possible regarding performance, though.

Marco

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Amir E. Aharoni
2009/1/6 Marco Schuster :
> On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson  wrote:
>> In the longer term, we need to redesign the interwiki links into a
>> centralized system, that can be maintained.  I think the way to do
>> this is to use Wikimedia Commons.  Instead of copying all the
>> interwiki links to every language of Wikipedia, it should be
>> enough to add {{commons|Category:Writers from Austria}}, and the
>> rest should happen automatically.
>
> Commons has enough to do with keeping metafiles up to date, they'd be
> crashed by also having to maintain IW links. I'd propose a new wiki,
> to which editors have to apply to get write access so that vandalism
> in this critical part is prevented.

The technology already exists:

* http://meta.wikimedia.org/wiki/A_newer_look_at_the_interlanguage_link

But it is not enabled yet:

* https://bugzilla.wikimedia.org/show_bug.cgi?id=15607

-- 
Amir Elisha Aharoni

heb: http://haharoni.wordpress.com | eng: http://aharoni.wordpress.com
cat: http://aprenent.wordpress.com | rus: http://amire80.livejournal.com

"We're living in pieces,
 I want to live in peace." - T. Moore

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Andre Engels
On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson  wrote:

> So, my question:
>
> Has anybody mapped exactly how many such interwiki conflicts we
> have?  Or how many interwiki sets do we have without conflicts?
> Could/should someone make a list of current conflicts and try to
> rank them by importance, so we can get started in fixing them?

As you already noted, pywikipediabot when run autonomously will add a
remark on each such conflict, so that would be an easy way to harvest
a large number of them. There are many of them - although there are
many people working on interwiki, they usually either just add them,
or run autonomous bots, correcting incorrect links takes place much
less.

Resolving them is in some cases easy, but in many cases not. Different
Wikipedias not rarely have different ways of 'subdividing' the
'universe' of possible meanings. This means that the dual assumptions
that 'interwiki is an equivalence relation' and 'any page can
interwiki to only one page in a single language' that the framework is
based on, are often not met, or only in artificial ways.

Examples of problems are:
* Closely connected subjects (for example, a biological order and the
only family in it, a municipality and its main town by the same name,
a fruit tree and its fruit, a computer game and the series of which it
is the first game, two scientific terms which are each other's
opposite) have two pages on some Wikipedias, one page on other, and
that one page is sometimes more one subject, sometimes more the other,
and sometimes really about both
* Words that mean a general term in one language being used for a more
specific one in another language, for example [[en:Autobahn]] being
about highways in Germany, [[de:Autobahn]] about highways in general,
or the name of a Japanese traditional dagger being used to mean that
specific type of dagger in western language, but more generally
'dagger' in Japanese, or countries using their own mythical small
creature as the best translation of 'dwarf', but being about dwarves
in a specific mythology elsewhere
* Slight shifts of meaning from one language to the other causing a
sequence of 'closest connections' leading to another word in the same
language

-- 
André Engels, andreeng...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Andre Engels
I might have sounded too negative by including all those problems. I
think it would be good to do a search for such conflicts, and I know
that several of them CAN be easily corrected. But one should not close
the eyes to the fact that there are clear problems.


-- 
André Engels, andreeng...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Daniel Kinzler
Once again, I'd like to point the interested reader to my own take on the issue
of interlanguage links:
.

I still believe that that would be better than a central place for managing
interwikis. In a nutshell: edit locally, like now, but compare globally, and
show also *incoming* interwiki links.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Eugene Zelenko
Hi!

See http://ru.wikipedia.org/wiki/User:VolkovBot/conflicts for list of
conflicts. VolkovBot is pretty active, so list should be more or less
comprehensive.

Eugene.

On Tue, Jan 6, 2009 at 1:52 AM, Lars Aronsson  wrote:
>
> I just recently started to play with interwiki.py (Pywikipedia bot
> framework) for propagating interwiki links.  My interest comes
> from organizing the category tree, so I'm focusing on interwiki
> links between categories.  Interwiki bots normally run in
> autonomous mode, but this means they give up on complicated cases.
>
> If I run this script under manual supervision, without the
> "-autonomous" option, it stops and asks me how to resolve each
> conflict. This happens ever so often.  I have now (manually)
> sorted out the interwiki links between all languages of
> Category:Knowledge, which was intertwined with Category:Science,
> and Category:Austrian writers which was mixed up with
> Category:Austrian literature.  Such mistakes easily happen, of
> course.  Who can spot errors in all these languages?
>
> Many languages had interwiki links from their category for
> Austrian writers to the Japanese category for Austrian literature.
> I'm not sure exactly when or where this error originated.  But on
> June 19, 2007, the English and Spanish Wikipedia's interwiki link
> to Japanese changed from Austrian novelists to Austrian
> literature, i.e. from one error to another. Ten days later, this
> link was copied to the Dutch Wikipedia. The error was corrected on
> en.wikipedia on October 1, 2007, but remained on other languages.
> Yes, that's 15 months ago.
>
> The circular interwiki link structure from en:Category:Austrian
> writers to es:Categoría:Escritores de Austria to ja:... and back
> to en:Category:Austrian literature is such a conflict that makes
> interwiki.py give up when it runs in autonomous mode.
>
> Thus, corrections (as on October 1) do not propagate.  Instead a
> report about the conflict is given in a logfile, but apparently
> nobody had fixed this problem in the last 15 monhts.  This
> conflict also blocked new interwiki links from propagating.
>
> After I cleared up the mess, 21 new interwiki links were added to
> the category on the Russian Wikipedia (one where I have a bot
> flag).  That means 21 languages of Wikipedia had created
> categories (or announced them to the interwiki system) for
> Austrian writers in the last 15 months, and they all added their
> interwiki link to the English Wikipedia.  But these additions did
> not propagate because of the conflict.
>
> So, my question:
>
> Has anybody mapped exactly how many such interwiki conflicts we
> have?  Or how many interwiki sets do we have without conflicts?
> Could/should someone make a list of current conflicts and try to
> rank them by importance, so we can get started in fixing them?
>
> In the longer term, we need to redesign the interwiki links into a
> centralized system, that can be maintained.  I think the way to do
> this is to use Wikimedia Commons.  Instead of copying all the
> interwiki links to every language of Wikipedia, it should be
> enough to add {{commons|Category:Writers from Austria}}, and the
> rest should happen automatically.
>
>
>
> --
>  Lars Aronsson (l...@aronsson.se)
>  Aronsson Datateknik - http://aronsson.se
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread David Gerard
2009/1/6 Lars Aronsson :

> Has anybody mapped exactly how many such interwiki conflicts we
> have?  Or how many interwiki sets do we have without conflicts?
> Could/should someone make a list of current conflicts and try to
> rank them by importance, so we can get started in fixing them?


Someone actually did this, it was discussed a few months ago on
wikien-l. (I'm writing this in my lunch hour, so haven't time to track
down the thread in the archive right now, sorry.)

But basically: treating interwiki links as a 1-1 relationship even
from one wiki to another is horribly unreliable, and assuming you can
go from wiki A to wiki B to wiki C with interwiki links is just not
doable reliably with robots.

It's not quite as horrible as trying to make ontological sense of the
category tree (where the only relationship that can be presumed is
"has something to do with" - one of the reasons that making cats work
like tags with a good complex Boolean query frontend would be so
useful), but it's in the same realms of hair-tearing horror for the
same reasons, i.e. people are a problem.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Daniel Kinzler
David Gerard schrieb:
> But basically: treating interwiki links as a 1-1 relationship even
> from one wiki to another is horribly unreliable, and assuming you can
> go from wiki A to wiki B to wiki C with interwiki links is just not
> doable reliably with robots.

If you only look at language-links that got *both* ways, you get a decent 1-to-1
mapping. I used this as part of my thesis, and wrote a short paper about it:
.

I can also recommend the studies of Rainer Hammwöhner about Wikipedia,
especially "Interlingual Aspects if Wikipedia’s Quality"
,
which studies the quality of language links and the categtory system, among
other things.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Lars Aronsson
Marco Schuster wrote:

> Commons has enough to do with keeping metafiles up to date, 
> they'd be crashed by also having to maintain IW links. I'd 
> propose a new wiki, to which editors have to apply to get write 
> access so that vandalism in this critical part is prevented.

I think the system needs to work like the categories.  Very few 
people need to edit the category page, so we don't really have to 
worry about who can access that central storage.  If I write an 
article about a president, I copy a category link from another 
president biography.  I don't have to update the category page and 
I don't have to update other articles in the same category.

>From a biography of the same president in another language, I can 
copy a list of interwiki links.  But instead I should just copy a 
single, global interwiki pointer.  As far as I understand, this is 
how the interlanguage extension should work.

What stops us from trying that out?  Could it be introduced in 
small steps, or is it a big scary change?


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Lars Aronsson
Andre Engels wrote:

> Examples of problems are:
> * Closely connected subjects (for example, a biological order 
>   and the only family in it, a municipality and its main town by 
>   the same name, a fruit tree and its fruit, a computer game and 

Such problems certainly exist. But they are not our worst problem 
at the moment.  Today I sorted out "Calvin Klein" (the company) 
and "Calvin Klein (fashion designer)" (the person).  They now form 
two separate interwiki clusters, without conflicts.  But sorting 
this out was more hard work than it needs to be.  With some 
improved tools, we can make this work a little easier.

The Category:Politicians in many languages has an interwiki link 
to the Armenian (hy:) category for political scientists.  I fixed 
the English Wikipedia (manually) and the North European languages 
(by bot), but some 50 languages remain to be edited.

If interwiki.py supported SUL and if I had a truly global bot 
flag, I could do it. But I'm reluctant to edit 50 languages 
manually, especially since there are hundreds of such conflicts.  

One problem here is that interwiki.py only adds links.  Both 
correct ones and errors are quickly propagated.  But corrections 
are not propagated, because the conflicts make it give up.  An 
easy way to remove that hy: interwiki link would be a great help.


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Andre Engels
On Wed, Jan 7, 2009 at 6:08 AM, Lars Aronsson  wrote:

> The Category:Politicians in many languages has an interwiki link
> to the Armenian (hy:) category for political scientists.  I fixed
> the English Wikipedia (manually) and the North European languages
> (by bot), but some 50 languages remain to be edited.
>
> If interwiki.py supported SUL and if I had a truly global bot
> flag, I could do it. But I'm reluctant to edit 50 languages
> manually, especially since there are hundreds of such conflicts.

You can do it by bot as things are. I myself use Robbot on all
languages; the only thing that could be improved regarding SUL is that
I have to type in its password once for each language rather than one
time for all, and as regards bot flags - it seems it has one on every
language where it needs it.

> One problem here is that interwiki.py only adds links.  Both
> correct ones and errors are quickly propagated.  But corrections
> are not propagated, because the conflicts make it give up.  An
> easy way to remove that hy: interwiki link would be a great help.

Well, as said, I use Robbot on all languages, the code I use for that is:

from family import Family
for lang in Family().alphabetic:
usernames['wikipedia'][lang] = 'Robbot'

This gives me 2 warnings every time I start the bot, but I just ignore
them. With such a setting, whenever I get to a conflict of which I
know the resolution, I start a separate interwiki.py with the
necessary -ignore or -neverlink and -force, and the bot will remove at
least that problem everywhere it exists.


-- 
André Engels, andreeng...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Andre Engels
On Wed, Jan 7, 2009 at 6:20 AM, Lars Aronsson  wrote:

> From a biography of the same president in another language, I can
> copy a list of interwiki links.  But instead I should just copy a
> single, global interwiki pointer.  As far as I understand, this is
> how the interlanguage extension should work.
>
> What stops us from trying that out?  Could it be introduced in
> small steps, or is it a big scary change?

I think it would be a big change. At the moment we have a single
database per wiki, and no actual connection between the various
databases. As far as I know the only exception to that is the images
from Commons, but your idea goes further than that, because I cannot
change a picture on Commons by editing a wiki page elsewhere. This
would both be a large conceptual change and a technical issue (suppose
the 'interwiki database' is down for writing, what do we do when
someone tries to edit a page?)

Apart from that there is the issue of naming of pages on this central
depository. It seems you'd have to have an interwiki consensus about
that... And then there's initial population - what do we do with the
currently existing problems? I guess that's a point that could be done
in small steps though (allow the 'old' and the 'new' system to exist
in parallel for some time). What do you do with new problems? That is,
what if the same subject is linked from 2 pages in one language? And
what if A is of the opinion that a group of pages should all be the
same 'interwiki group' and B that they should be two? Will we be
getting cross-wiki edit wars?

-- 
André Engels, andreeng...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-06 Thread Andre Engels
There's one problem with these interwiki links that has not yet been
mentioned in this thread: Not rarely when I have finally sorted out
two subjects, and kept only those interwiki that are to the same
subject, someone comes around and tells me that I should not be
removing correct interwiki links.


-- 
André Engels, andreeng...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Interwiki conflicts

2009-01-07 Thread Simon Walker
2009/1/7 Andre Engels :
> As far as I know the only exception to that is the images
> from Commons,

You missed CentralAuth. :)

> (suppose the 'interwiki database' is down for writing, what do we
> do when someone tries to edit a page?)

Central editing of those would solve that

> Will we be getting cross-wiki edit wars?

Probabally, and the people may not even be able to talk about it, due
to the language differences. It'll be a pain to sort out.

-- 
Regards,

Simon Walker
User:Stwalkerster on all public Wikimedia Foundation wikis
Administrator on the English Wikipedia
Developer of Helpmebot and the ACC tool

Your donations keep Wikipedia running! Support the Wikimedia
Foundation today: http://www.wikimediafoundation.org/wiki/Donate

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Interwiki conflicts

2009-01-08 Thread Platonides
Amir E. Aharoni wrote:
> * http://meta.wikimedia.org/wiki/Interwiki_synchronization
>
> The last one is my own creation, which has surprisingly caught on;
> Unfortunately i haven't had much time to maintain it lately, and i'd
> be very glad if someone could help with that.

I think it's popular because it is easy to collaborate and do something
with the problem. As opposed to discussing 'interwikis are broken', on
which all of us agree.

However, I'd still improve the interface. Why edit a page to change the
group, instead of choosing the meaning with one click?
I'd move it to the toolserver with a interface to view the interwiki
groups, split and define them, move interwikis on their groups...
All of that then backed by some bot.

Moving to the wikipedia scenario, the interwikis could be shown on a
different state "conflicted". Thus on normal wiki interaction,
wikipedians would notice *on their wiki*, lead to the interwiki managing
(having a link on p-lang) and help to fix it (and I say help instead of
fix because this is work has to be collaborative).

The briefs for the article groups also could and should be reused for
something else (WikitionaryZ, simplewiki, yahoo abstracts...).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l