Re: [Wikitech-l] Language variants

2009-09-15 Thread Jared Williams


Doesn't having geographically located page caches reduce the doubling effect
in any given location?

Squids located in the US should be caching more en-US than en-GB, and those
in Europe should have more en-GB than en-US.

Jared

 -Original Message-
 From: wikitech-l-boun...@lists.wikimedia.org 
 [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of 
 Gerard Meijssen
 Sent: 12 September 2009 09:48
 To: Wikimedia developers
 Subject: Re: [Wikitech-l] Language variants
 
 Hoi,
 When we are to do this for English and have digitise and 
 digitize, we have to keep in mind that this ONLY deals with 
 issues that are differences between GB and US English. There 
 are other varieties of English that may make this more complicated.
 
 Given the size of the GB and US populations it would split 
 the cache and effectively double the cache size. There are 
 more languages where this would provide serious benefits. I 
 can easily imagine that the German, Spanish and Portuguese 
 community would be interested.. Then there are many of the 
 other languages that may have an interest.. The first order 
 of business is not can it be done but who will implement and 
 maintain the language part of this.
 Thanks,
  GerardM
 
 2009/9/12 Ilmari Karonen nos...@vyznev.net
 
  Happy-melon wrote:
   Ilmari Karonen wrote:
  
   -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: 
   {{GFDL/ar}};
   ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: 
 {{GFDL/be-tarask}}; 
   !-- ...and so on for about 70 more languages --}-
  
   The above begs the question, of course, would this switch 
 actually work?
   And if it does, how does it affect the cache and 
 linktables?  More 
   investigation needed, methinks
 
  Indeed, that was what I was wondering about too.  Without actually 
  trying it out, my guess would be that it would indeed work, but at a
  cost: it'd first parse all the 75 or so subtemplates and then throw 
  all but one of them away.
 
  Of course, that's what one would have to do anyway, to get 
 full link 
  table consistency.
 
  It does seem to me that it might not be *that* inefficient, 
 *if* the 
  page were somehow cached in its pre-languageconverted state 
 but after 
  the expensive template parsing has been done.  Does such a cache 
  actually exist, or, if not, could one be added with reasonable ease?
 
  --
  Ilmari Karonen
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-12 Thread Ilmari Karonen
Happy-melon wrote:
 Ilmari Karonen wrote:
 
 -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
 ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- 
 ...and so on for about 70 more languages --}-
 
 The above begs the question, of course, would this switch actually work? 
 And if it does, how does it affect the cache and linktables?  More 
 investigation needed, methinks

Indeed, that was what I was wondering about too.  Without actually 
trying it out, my guess would be that it would indeed work, but at a 
cost: it'd first parse all the 75 or so subtemplates and then throw all 
but one of them away.

Of course, that's what one would have to do anyway, to get full link 
table consistency.

It does seem to me that it might not be *that* inefficient, *if* the 
page were somehow cached in its pre-languageconverted state but after 
the expensive template parsing has been done.  Does such a cache 
actually exist, or, if not, could one be added with reasonable ease?

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-12 Thread Gerard Meijssen
Hoi,
When we are to do this for English and have digitise and digitize, we have
to keep in mind that this ONLY deals with issues that are differences
between GB and US English. There are other varieties of English that may
make this more complicated.

Given the size of the GB and US populations it would split the cache and
effectively double the cache size. There are more languages where this would
provide serious benefits. I can easily imagine that the German, Spanish and
Portuguese community would be interested.. Then there are many of the
other languages that may have an interest.. The first order of business is
not can it be done but who will implement and maintain the language part of
this.
Thanks,
 GerardM

2009/9/12 Ilmari Karonen nos...@vyznev.net

 Happy-melon wrote:
  Ilmari Karonen wrote:
 
  -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
  ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !--
  ...and so on for about 70 more languages --}-
 
  The above begs the question, of course, would this switch actually work?
  And if it does, how does it affect the cache and linktables?  More
  investigation needed, methinks

 Indeed, that was what I was wondering about too.  Without actually
 trying it out, my guess would be that it would indeed work, but at a
 cost: it'd first parse all the 75 or so subtemplates and then throw all
 but one of them away.

 Of course, that's what one would have to do anyway, to get full link
 table consistency.

 It does seem to me that it might not be *that* inefficient, *if* the
 page were somehow cached in its pre-languageconverted state but after
 the expensive template parsing has been done.  Does such a cache
 actually exist, or, if not, could one be added with reasonable ease?

 --
 Ilmari Karonen

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-12 Thread Domas Mituzas
 Given the size of the GB and US populations it would split the cache  
 and
 effectively double the cache size.

Did I just see you putting performance ahead of language support? Just  
checkin'

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-11 Thread Platonides
Ilmari Karonen wrote:
 A popular game played with a bat and ball is -{en-gb:Cricket;
 en-us:Baseball}-.
 
 That reminds me... some time ago, someone proposed to enable 
 LanguageConverter on Commons (but without any automatic conversion, 
 presumably) and to (ab?)use it to replace the existing autotranslation 
 hacks based on {{int:lang}}.  Would that be in any sense feasible?
 
 There would presumably be two major use cases: the easy one, which I do 
 believe the converter should handle just fine, would be to replace the 
 current http://commons.wikipedia.org/wiki/Template:LangSwitch, 
 generally used to autotranslate short phrases, with syntax like:
 
 -{de:Eigene Arbeit; en:Own work; fi:Oma teos; fr:Travail personnel; etc.}-
 
 (See http://commons.wikipedia.org/wiki/Template:Own for the source of 
 the example.)

I don't think it's really a saner syntax.


 The not-so-simple case would be replacing 
 http://commons.wikipedia.org/wiki/Template:Autotranslate, which is 
 used to translate entire templates, usually (though by no means 
 necessarily) combined with a long list of links to the various 
 translations so that users can easily browse them if the automatically 
 chosen version is no good or something.  A naive implementation of that 
 would look something like:
 
 -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; 
 ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- 
 ...and so on for about 70 more languages --}-
 
 (Source: http://commons.wikipedia.org/wiki/Template:GFDL.)
 
 I'd like to hope that there might be some better way of doing it, 
 though, even if I can't offhand think of what it might look like.
 
 Still, would something like that work, even in theory, and would it be 
 an improvement over the way these things are currently done (which is 
 hacky enough itself)?

I don't think so. It's terribly ugly. You would want something like
{{GFDL/{{ENABLEDVARIANT (no, such magic word doesn't seem to exist yet).
But you would still have the problem of having people *choose* them. You
wouldn't put dozens of tabs to choose the variant. Which in fact isn't a
variant.

These are languages, variant system is not appropiate for them.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Platonides
Helder Geovane Gomes de Lima wrote:
 But I wasn't able to create a param default in order we could set
 which of the variants will be shown by default for anonymous users. It
 would be good if we could use {{Language variations| default = pt-br |
 pt = word 1| pt-br = word 2}} to get:
 (a) word 2, for annonimous users;
 (b) word 1, for logged users which choose 'pt' in their preferences;
 (c) word 2, for logged users which choose 'pt-br' in their preferences;
 The option (a) would be necessary if we don't want to change an
 existing text from 'pt-br' to 'pt' (for anonymous users) just because
 we want the logged users to be able to choose the content variant.

There's no difference. Anonymous users get the default language.
What you could do is having thee languages: pt (generic Portuguese,
default), pt-pt and pt-br.

 Is there any way of detect if the reader is logged in with something
 in the style {{#if: what? | foo| bar}}?
No.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Aryeh Gregor
On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling tstarl...@wikimedia.org wrote:
 I don't know why you're writing this nonsense, you obviously haven't
 looked at the code at all.

This paragraph is unnecessary.

 The language variant system that we have could easily convert between
 US and UK English. In fact it already does convert between a language
 pair with a far more complex relationship, that is Simplified and
 Traditional Chinese.

 The language conversion system is very simple, it's just a table of
 translated pairs, where the longest match takes precedence. The
 translation table in one direction (e.g. UK - US) can be different to
 the table in the other direction (US - UK). You would not list ize
 - ise, you would list every word in the dictionary with an -ize
 ending that can be translated to -ise without controversy. The current
 software could handle 50k pairs or so without serious performance
 problems, and it could be extended and optimised to allow millions of
 pairs if there was a need for that.

 It's possible to handle any pair of languages which are separated only
 by vocabulary, and transliteration or spelling. It's only differences
 in grammar, such as word order, that would give it trouble.

Is there any reason nobody's tried adding such support for us/uk
English?  It would resolve some long-standing tension on enwiki.
Would anons have to be given one variant or the other, or would they
get untransformed text or what?  Does the variant transformation apply
to the edit page as well?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Trevor Parscal
On 9/10/09 10:06 AM, Aryeh Gregor wrote:
 On Wed, Sep 9, 2009 at 6:50 PM, Tim Starlingtstarl...@wikimedia.org  wrote:

 I don't know why you're writing this nonsense, you obviously haven't
 looked at the code at all.
  
 This paragraph is unnecessary.

Seriously! Please read things aloud before clicking send. You will 
hopefully then be able to better detect when it's time to take a break, 
eat some fruit and take it down a notch.
 The language variant system that we have could easily convert between
 US and UK English. In fact it already does convert between a language
 pair with a far more complex relationship, that is Simplified and
 Traditional Chinese.

 The language conversion system is very simple, it's just a table of
 translated pairs, where the longest match takes precedence. The
 translation table in one direction (e.g. UK -  US) can be different to
 the table in the other direction (US -  UK). You would not list ize
 -  ise, you would list every word in the dictionary with an -ize
 ending that can be translated to -ise without controversy. The current
 software could handle 50k pairs or so without serious performance
 problems, and it could be extended and optimised to allow millions of
 pairs if there was a need for that.

 It's possible to handle any pair of languages which are separated only
 by vocabulary, and transliteration or spelling. It's only differences
 in grammar, such as word order, that would give it trouble.
  
 Is there any reason nobody's tried adding such support for us/uk
 English?  It would resolve some long-standing tension on enwiki.
 Would anons have to be given one variant or the other, or would they
 get untransformed text or what?  Does the variant transformation apply
 to the edit page as well?

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

The variant system seems poorly understood by most people (including me) 
which often tends to cause something (like it for instance) to also be 
under-utilized...

Perhaps we need more information on what it intends to provide the user. 
All I find in Google on this topic are blurbs about configuration 
variables and lots of people confused as to what language variants even 
are...

Is there some awesome documentation somewhere I have yet to find?

- Trevor

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Ariel T. Glenn
The differences between the UK and American varieties of English are not
limited just to spelling and vocabulary.

Ariel

Στις 10-09-2009, ημέρα Πεμ, και ώρα 13:06 -0400, ο/η Aryeh Gregor
έγραψε:
 On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling tstarl...@wikimedia.org wrote:
  I don't know why you're writing this nonsense, you obviously haven't
  looked at the code at all.
 
 This paragraph is unnecessary.
 
  The language variant system that we have could easily convert between
  US and UK English. In fact it already does convert between a language
  pair with a far more complex relationship, that is Simplified and
  Traditional Chinese.
 
  The language conversion system is very simple, it's just a table of
  translated pairs, where the longest match takes precedence. The
  translation table in one direction (e.g. UK - US) can be different to
  the table in the other direction (US - UK). You would not list ize
  - ise, you would list every word in the dictionary with an -ize
  ending that can be translated to -ise without controversy. The current
  software could handle 50k pairs or so without serious performance
  problems, and it could be extended and optimised to allow millions of
  pairs if there was a need for that.
 
  It's possible to handle any pair of languages which are separated only
  by vocabulary, and transliteration or spelling. It's only differences
  in grammar, such as word order, that would give it trouble.
 
 Is there any reason nobody's tried adding such support for us/uk
 English?  It would resolve some long-standing tension on enwiki.
 Would anons have to be given one variant or the other, or would they
 get untransformed text or what?  Does the variant transformation apply
 to the edit page as well?
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Language variants

2009-09-10 Thread Mark Williamson
It might be possible to make it apply to the edit page as well, but in
zh.wp, sr.wp, and kk.wp currently it does not. I'm guessing (could be
wrong) that it would eat up a lot more resources.

Mark

skype: node.ue



On Thu, Sep 10, 2009 at 11:49 AM, Helder Geovane Gomes de Lima
heldergeov...@gmail.com wrote:
 2009/9/10 Aryeh Gregor
 simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com


 On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn ar...@wikimedia.org
 wrote:
  The differences between the UK and American varieties of English are not
  limited just to spelling and vocabulary.

 Those account for the large majority of the more noticeable
 differences, however.


 I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if
 the table doesn't solves every case, what it solves is sufficiently good...

 2009/9/10 Aryeh Gregor
 simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com


 Is there any reason nobody's tried adding such support for us/uk
 English?  It would resolve some long-standing tension on enwiki.
 Would anons have to be given one variant or the other, or would they
 get untransformed text or what?  Does the variant transformation apply
 to the edit page as well?


 I have the same questions...

 Helder
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Helder Geovane Gomes de Lima
2009/9/9 Tim Starling tstarl...@wikimedia.org

 The language variant system that we have could easily convert between
 US and UK English. In fact it already does convert between a language
 pair with a far more complex relationship, that is Simplified and
 Traditional Chinese.

 The language conversion system is very simple, it's just a table of
 translated pairs, where the longest match takes precedence. The
 translation table in one direction (e.g. UK - US) can be different to
 the table in the other direction (US - UK). You would not list ize
 - ise, you would list every word in the dictionary with an -ize
 ending that can be translated to -ise without controversy. The current
 software could handle 50k pairs or so without serious performance
 problems, and it could be extended and optimised to allow millions of
 pairs if there was a need for that.


Hello again!

What would be needed in order to use pages like MediaWiki:Conversiontable/pt
and MediaWiki:Conversiontable/pt-br at the wikimedia projects in Portuguese
for the conversion? Is it easy to have the language conversion enabled?
Could we gradually create the conversion tables?

Sorry for so many questions...

Helder
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Aryeh Gregor
On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 Seems I'm not the only one who had a completely wrong idea about how
 variants work. We definitely need more documentation and fame for this
 system, so its potential doesn't go to waste.

I theoretically knew that it was just a string-replace system, but it
didn't occur to me that it would be useful for more than
transliteration.  It makes sense now that Tim pointed that out.  How
would it handle word breaks, though?  It would just ignore them, so
color - colour also changes uncolored - uncoloured?  What about
things like HTML id's or even attribute/property names (span
style=color:red)?  I'm sure I could dig through the code to find
the answers to these, but actually I'm not even sure offhand where the
code *is*.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Tim Starling
Aryeh Gregor wrote:
 On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 Seems I'm not the only one who had a completely wrong idea about how
 variants work. We definitely need more documentation and fame for this
 system, so its potential doesn't go to waste.
 
 I theoretically knew that it was just a string-replace system, but it
 didn't occur to me that it would be useful for more than
 transliteration.  It makes sense now that Tim pointed that out.  How
 would it handle word breaks, though?  It would just ignore them, so
 color - colour also changes uncolored - uncoloured? 

Neither of the implementations so far has required any knowledge of
word breaks, and so it has not been implemented. In theory you could
just list every larger word that contains a smaller transformed word, e.g.

humor - humour
humorous - humorous

But it might be better to just add a word segmentation feature.

 What about
 things like HTML id's or even attribute/property names (span
 style=color:red)?  I'm sure I could dig through the code to find
 the answers to these, but actually I'm not even sure offhand where the
 code *is*.

languages/LanguageConverter.php. There are some rather inelegant
regexes to deal with cases like these, they seem to work. The
converter operates at a near-HTML stage of the parser, so it's not too
hard to skip attributes.

Note that the FastStringSearch extension is important for acheiving
good performance, especially in Chinese.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Helder Geovane Gomes de Lima
Hello!

I think the code is these:
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00018
http://svn.wikimedia.org/doc/LanguageZh_8php-source.html#l9

and a comment at
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00258
says:

00271  /* we convert everything except:
00272  1. html markups (anything between  and )
00273  2. html entities
00274  3. place holders created by the parser
00275  */

So, I don't think it will convert span style=color:red. But I'm
not sure, because I'm still learning php...

By the way, I can't understand Chinese, but (after using an on-line
translator) I think the page they have for documenting the system is
this:
http://zh.wikipedia.org/wiki/Help:%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E7%9A%84%E7%B9%81%E7%AE%80%E5%A4%84%E7%90%86

Helder




2009/9/10 Aryeh Gregor simetrical+wikil...@gmail.com

 On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote:
  Seems I'm not the only one who had a completely wrong idea about how
  variants work. We definitely need more documentation and fame for this
  system, so its potential doesn't go to waste.

 I theoretically knew that it was just a string-replace system, but it
 didn't occur to me that it would be useful for more than
 transliteration.  It makes sense now that Tim pointed that out.  How
 would it handle word breaks, though?  It would just ignore them, so
 color - colour also changes uncolored - uncoloured?  What about
 things like HTML id's or even attribute/property names (span
 style=color:red)?  I'm sure I could dig through the code to find
 the answers to these, but actually I'm not even sure offhand where the
 code *is*.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-10 Thread Tim Starling
Ariel T. Glenn wrote:
 The differences between the UK and American varieties of English are not
 limited just to spelling and vocabulary.


Note that the -{...}- structure is available in wikitext to translate
article-specific fragments of text, so you can also translate worldview:

A popular game played with a bat and ball is -{en-gb:Cricket;
en-us:Baseball}-.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-09 Thread Roan Kattouw
2009/9/9 Helder Geovane Gomes de Lima heldergeov...@gmail.com:
 Hello!

 I noticed at sr.wikipedia there is an option Variant under
 Internationalization at the preferences. How is that different from
 the 'sr', 'sr-ec' and 'sr-el' which are shown at Language option
 (also under Internationalization)?

 I'm interested in this because there are some differences between
 Brazilian Portuguese ('pt-br') and Portuguese of Portugal ('pt')
 which usually cause troubles for the admins at the Portuguese
 projects, who needs to warn the users not to change the wording of the
 texts from one variant to another (this usually happens, mainly from
 anonymous contributions), because some differences between the
 variants seems to be [at a first glance] a typo, and they want to
 correct it...

sr-ec and sr-el refer to the Latin and Cyrillic variants of Serbian
(not sure which is which), and AFAIK the software can convert
everything, even article text, because the conversion rules are so
simple that a computer can execute them. Basically, sr-ec and sr-el
have the same text in the same language, but in different alphabets.
(This is my understanding, which may be completely wrong; in that
case, please correct me.)

The difference between pt and pt-br are more delicate than that, and
the two can't be autoconverted between by a computer, because of
differences in spelling word usage and grammar(?).

 So, I would like to know if there is currently any feature which could
 help us to avoid the problem of having a divided community of users
 ('pt' x 'pt-br') fighting with each other ad infinitum... (and to
 avoid proposals like that [1] of a new Brazilian Wikipedia, which
 IMHO will not have any good result, and is not the better way of
 solving the problem...)

No. We already offer users the choice between having the interface in
pt or pt-br (or any other language, really), but such a choice doesn't
exist for the content.

 I found 
 [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipediadiff=14163oldid=13621
 a comment] about the existence of on-the-fly translation for some
 languages (Chinese and Serbian), but I don't know how it works, and if
 it solves or improve the situation.

That's the alphabet variant thing I mentioned earlier. If the majority
of the differences between pt and pt-br can be summed up with simple
rules that a computer can handle, we might be able to work something
out. However, that's usually not the case; I don't know Portugese, but
I do know that handling even simple differences between en-us and
en-gb is too complex already: a system that would successfully convert
'realise' to 'realize' may also try to wrongfully convert 'disguise'.

 And before this I was also thinking of use (a possible enhanced
 version of) a procedure like this: considering that currently it is
 possible to show a system message using {{int:MESSAGE}} in the
 wikitext in a way that the result changes according to the user's
 language, would it be possible to create new messages at MediaWiki:
 Namespace just for defining language variants of words which usually
 appears at the content of the projects? For example, would it be
 possible to create MediaWiki:WORD/pt-br and MediaWiki:WORD/pt, and
 use them (with {{int:WORD}}) instead of the actual word variant in
 wikitext? This isn't likely to be the better solution, but it could be
 a first step towards a solution...

This sounds like it could work, but only if the /langcode trick
actually works (I don't know what that depends on) and if there's a
relatively small set of words that makes a relatively big difference
(otherwise it'd be more trouble than it's worth IMO; but that's up to
the community).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-09 Thread David Gerard
2009/9/9 Roan Kattouw roan.katt...@gmail.com:
 2009/9/9 Helder Geovane Gomes de Lima heldergeov...@gmail.com:

 So, I would like to know if there is currently any feature which could
 help us to avoid the problem of having a divided community of users
 ('pt' x 'pt-br') fighting with each other ad infinitum... (and to
 avoid proposals like that [1] of a new Brazilian Wikipedia, which
 IMHO will not have any good result, and is not the better way of
 solving the problem...)

 No. We already offer users the choice between having the interface in
 pt or pt-br (or any other language, really), but such a choice doesn't
 exist for the content.


This is a community issue. Having a single pt:wp is a win because
there's more content in one place and it avoids local-POV bias, same
as there's one en:wp rather than US-English and Commonwealth-English.

So you need a community rule.

The rule we have on en:wp is:

1. It doesn't matter.
2. Use the variant spoken in the location, if relevant.
3. Don't change articles from one to the other except per 2.
4. Try not to worry too much about it.

4. is the important step ;-) It should be simple enough to let new
users know the rule and not to worry about which variant :-)


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Language variants

2009-09-09 Thread Helder Geovane Gomes de Lima
Nice! ;-)

Do you think tables like these
http://pt.wiktionary.org/wiki/Wikcionário:Versões da língua portuguesa/Tabela
http://pt.wikipedia.org/wiki/Wikipedia:Versões da língua portuguesa/tabela
could be a start point to a similar conversion system for pt - pt-br?

Meanwhile, I was also trying to adapt the Template:LangSwitch from
Wikimedia Commons
(http://commons.wikimedia.org/wiki/Template:LangSwitch), in order to
be able to use the template syntax like this:
{{Language variations| pt = word 1| pt-br = word 2}}

For this, I've created two pages:
* MediaWiki:Lang, with 'pt'
* MediaWiki:Lang/pt-br, with 'pt-br'

and the template code is essentially:
{{#switch:{{int:Lang}}
|pt-br={{{pt-br|}}}
|pt
|#default={{{pt|}}}
}}

But I wasn't able to create a param default in order we could set
which of the variants will be shown by default for anonymous users. It
would be good if we could use {{Language variations| default = pt-br |
pt = word 1| pt-br = word 2}} to get:
(a) word 2, for annonimous users;
(b) word 1, for logged users which choose 'pt' in their preferences;
(c) word 2, for logged users which choose 'pt-br' in their preferences;
The option (a) would be necessary if we don't want to change an
existing text from 'pt-br' to 'pt' (for anonymous users) just because
we want the logged users to be able to choose the content variant.

Is there any way of detect if the reader is logged in with something
in the style {{#if: what? | foo| bar}}?
(the problem with {{int:Lang}} is that for anonymous users and for
users who choose 'pt' the result is the same: 'pt', so I can't
distinguish these two cases at the template...)

Anyway, I think it would be better to have some kind of an automatized
conversion system, even if it doesn't convert all cases ( at least for
the words in the tables above it would be useful)

Thank you for all,

Helder

2009/9/9 Tim Starling tstarl...@wikimedia.org:
 Roan Kattouw wrote:
 That's the alphabet variant thing I mentioned earlier. If the majority
 of the differences between pt and pt-br can be summed up with simple
 rules that a computer can handle, we might be able to work something
 out. However, that's usually not the case; I don't know Portugese, but
 I do know that handling even simple differences between en-us and
 en-gb is too complex already: a system that would successfully convert
 'realise' to 'realize' may also try to wrongfully convert 'disguise'.

 I don't know why you're writing this nonsense, you obviously haven't
 looked at the code at all.

 The language variant system that we have could easily convert between
 US and UK English. In fact it already does convert between a language
 pair with a far more complex relationship, that is Simplified and
 Traditional Chinese.

 The language conversion system is very simple, it's just a table of
 translated pairs, where the longest match takes precedence. The
 translation table in one direction (e.g. UK - US) can be different to
 the table in the other direction (US - UK). You would not list ize
 - ise, you would list every word in the dictionary with an -ize
 ending that can be translated to -ise without controversy. The current
 software could handle 50k pairs or so without serious performance
 problems, and it could be extended and optimised to allow millions of
 pairs if there was a need for that.

 It's possible to handle any pair of languages which are separated only
 by vocabulary, and transliteration or spelling. It's only differences
 in grammar, such as word order, that would give it trouble.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l