Re: [Wikitech-l] Language variants
Doesn't having geographically located page caches reduce the doubling effect in any given location? Squids located in the US should be caching more en-US than en-GB, and those in Europe should have more en-GB than en-US. Jared -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: 12 September 2009 09:48 To: Wikimedia developers Subject: Re: [Wikitech-l] Language variants Hoi, When we are to do this for English and have digitise and digitize, we have to keep in mind that this ONLY deals with issues that are differences between GB and US English. There are other varieties of English that may make this more complicated. Given the size of the GB and US populations it would split the cache and effectively double the cache size. There are more languages where this would provide serious benefits. I can easily imagine that the German, Spanish and Portuguese community would be interested.. Then there are many of the other languages that may have an interest.. The first order of business is not can it be done but who will implement and maintain the language part of this. Thanks, GerardM 2009/9/12 Ilmari Karonen nos...@vyznev.net Happy-melon wrote: Ilmari Karonen wrote: -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- ...and so on for about 70 more languages --}- The above begs the question, of course, would this switch actually work? And if it does, how does it affect the cache and linktables? More investigation needed, methinks Indeed, that was what I was wondering about too. Without actually trying it out, my guess would be that it would indeed work, but at a cost: it'd first parse all the 75 or so subtemplates and then throw all but one of them away. Of course, that's what one would have to do anyway, to get full link table consistency. It does seem to me that it might not be *that* inefficient, *if* the page were somehow cached in its pre-languageconverted state but after the expensive template parsing has been done. Does such a cache actually exist, or, if not, could one be added with reasonable ease? -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Happy-melon wrote: Ilmari Karonen wrote: -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- ...and so on for about 70 more languages --}- The above begs the question, of course, would this switch actually work? And if it does, how does it affect the cache and linktables? More investigation needed, methinks Indeed, that was what I was wondering about too. Without actually trying it out, my guess would be that it would indeed work, but at a cost: it'd first parse all the 75 or so subtemplates and then throw all but one of them away. Of course, that's what one would have to do anyway, to get full link table consistency. It does seem to me that it might not be *that* inefficient, *if* the page were somehow cached in its pre-languageconverted state but after the expensive template parsing has been done. Does such a cache actually exist, or, if not, could one be added with reasonable ease? -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Hoi, When we are to do this for English and have digitise and digitize, we have to keep in mind that this ONLY deals with issues that are differences between GB and US English. There are other varieties of English that may make this more complicated. Given the size of the GB and US populations it would split the cache and effectively double the cache size. There are more languages where this would provide serious benefits. I can easily imagine that the German, Spanish and Portuguese community would be interested.. Then there are many of the other languages that may have an interest.. The first order of business is not can it be done but who will implement and maintain the language part of this. Thanks, GerardM 2009/9/12 Ilmari Karonen nos...@vyznev.net Happy-melon wrote: Ilmari Karonen wrote: -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- ...and so on for about 70 more languages --}- The above begs the question, of course, would this switch actually work? And if it does, how does it affect the cache and linktables? More investigation needed, methinks Indeed, that was what I was wondering about too. Without actually trying it out, my guess would be that it would indeed work, but at a cost: it'd first parse all the 75 or so subtemplates and then throw all but one of them away. Of course, that's what one would have to do anyway, to get full link table consistency. It does seem to me that it might not be *that* inefficient, *if* the page were somehow cached in its pre-languageconverted state but after the expensive template parsing has been done. Does such a cache actually exist, or, if not, could one be added with reasonable ease? -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Given the size of the GB and US populations it would split the cache and effectively double the cache size. Did I just see you putting performance ahead of language support? Just checkin' Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Ilmari Karonen wrote: A popular game played with a bat and ball is -{en-gb:Cricket; en-us:Baseball}-. That reminds me... some time ago, someone proposed to enable LanguageConverter on Commons (but without any automatic conversion, presumably) and to (ab?)use it to replace the existing autotranslation hacks based on {{int:lang}}. Would that be in any sense feasible? There would presumably be two major use cases: the easy one, which I do believe the converter should handle just fine, would be to replace the current http://commons.wikipedia.org/wiki/Template:LangSwitch, generally used to autotranslate short phrases, with syntax like: -{de:Eigene Arbeit; en:Own work; fi:Oma teos; fr:Travail personnel; etc.}- (See http://commons.wikipedia.org/wiki/Template:Own for the source of the example.) I don't think it's really a saner syntax. The not-so-simple case would be replacing http://commons.wikipedia.org/wiki/Template:Autotranslate, which is used to translate entire templates, usually (though by no means necessarily) combined with a long list of links to the various translations so that users can easily browse them if the automatically chosen version is no good or something. A naive implementation of that would look something like: -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- ...and so on for about 70 more languages --}- (Source: http://commons.wikipedia.org/wiki/Template:GFDL.) I'd like to hope that there might be some better way of doing it, though, even if I can't offhand think of what it might look like. Still, would something like that work, even in theory, and would it be an improvement over the way these things are currently done (which is hacky enough itself)? I don't think so. It's terribly ugly. You would want something like {{GFDL/{{ENABLEDVARIANT (no, such magic word doesn't seem to exist yet). But you would still have the problem of having people *choose* them. You wouldn't put dozens of tabs to choose the variant. Which in fact isn't a variant. These are languages, variant system is not appropiate for them. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Helder Geovane Gomes de Lima wrote: But I wasn't able to create a param default in order we could set which of the variants will be shown by default for anonymous users. It would be good if we could use {{Language variations| default = pt-br | pt = word 1| pt-br = word 2}} to get: (a) word 2, for annonimous users; (b) word 1, for logged users which choose 'pt' in their preferences; (c) word 2, for logged users which choose 'pt-br' in their preferences; The option (a) would be necessary if we don't want to change an existing text from 'pt-br' to 'pt' (for anonymous users) just because we want the logged users to be able to choose the content variant. There's no difference. Anonymous users get the default language. What you could do is having thee languages: pt (generic Portuguese, default), pt-pt and pt-br. Is there any way of detect if the reader is logged in with something in the style {{#if: what? | foo| bar}}? No. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling tstarl...@wikimedia.org wrote: I don't know why you're writing this nonsense, you obviously haven't looked at the code at all. This paragraph is unnecessary. The language variant system that we have could easily convert between US and UK English. In fact it already does convert between a language pair with a far more complex relationship, that is Simplified and Traditional Chinese. The language conversion system is very simple, it's just a table of translated pairs, where the longest match takes precedence. The translation table in one direction (e.g. UK - US) can be different to the table in the other direction (US - UK). You would not list ize - ise, you would list every word in the dictionary with an -ize ending that can be translated to -ise without controversy. The current software could handle 50k pairs or so without serious performance problems, and it could be extended and optimised to allow millions of pairs if there was a need for that. It's possible to handle any pair of languages which are separated only by vocabulary, and transliteration or spelling. It's only differences in grammar, such as word order, that would give it trouble. Is there any reason nobody's tried adding such support for us/uk English? It would resolve some long-standing tension on enwiki. Would anons have to be given one variant or the other, or would they get untransformed text or what? Does the variant transformation apply to the edit page as well? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
On 9/10/09 10:06 AM, Aryeh Gregor wrote: On Wed, Sep 9, 2009 at 6:50 PM, Tim Starlingtstarl...@wikimedia.org wrote: I don't know why you're writing this nonsense, you obviously haven't looked at the code at all. This paragraph is unnecessary. Seriously! Please read things aloud before clicking send. You will hopefully then be able to better detect when it's time to take a break, eat some fruit and take it down a notch. The language variant system that we have could easily convert between US and UK English. In fact it already does convert between a language pair with a far more complex relationship, that is Simplified and Traditional Chinese. The language conversion system is very simple, it's just a table of translated pairs, where the longest match takes precedence. The translation table in one direction (e.g. UK - US) can be different to the table in the other direction (US - UK). You would not list ize - ise, you would list every word in the dictionary with an -ize ending that can be translated to -ise without controversy. The current software could handle 50k pairs or so without serious performance problems, and it could be extended and optimised to allow millions of pairs if there was a need for that. It's possible to handle any pair of languages which are separated only by vocabulary, and transliteration or spelling. It's only differences in grammar, such as word order, that would give it trouble. Is there any reason nobody's tried adding such support for us/uk English? It would resolve some long-standing tension on enwiki. Would anons have to be given one variant or the other, or would they get untransformed text or what? Does the variant transformation apply to the edit page as well? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l The variant system seems poorly understood by most people (including me) which often tends to cause something (like it for instance) to also be under-utilized... Perhaps we need more information on what it intends to provide the user. All I find in Google on this topic are blurbs about configuration variables and lots of people confused as to what language variants even are... Is there some awesome documentation somewhere I have yet to find? - Trevor ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
The differences between the UK and American varieties of English are not limited just to spelling and vocabulary. Ariel Στις 10-09-2009, ημέρα Πεμ, και ώρα 13:06 -0400, ο/η Aryeh Gregor έγραψε: On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling tstarl...@wikimedia.org wrote: I don't know why you're writing this nonsense, you obviously haven't looked at the code at all. This paragraph is unnecessary. The language variant system that we have could easily convert between US and UK English. In fact it already does convert between a language pair with a far more complex relationship, that is Simplified and Traditional Chinese. The language conversion system is very simple, it's just a table of translated pairs, where the longest match takes precedence. The translation table in one direction (e.g. UK - US) can be different to the table in the other direction (US - UK). You would not list ize - ise, you would list every word in the dictionary with an -ize ending that can be translated to -ise without controversy. The current software could handle 50k pairs or so without serious performance problems, and it could be extended and optimised to allow millions of pairs if there was a need for that. It's possible to handle any pair of languages which are separated only by vocabulary, and transliteration or spelling. It's only differences in grammar, such as word order, that would give it trouble. Is there any reason nobody's tried adding such support for us/uk English? It would resolve some long-standing tension on enwiki. Would anons have to be given one variant or the other, or would they get untransformed text or what? Does the variant transformation apply to the edit page as well? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
It might be possible to make it apply to the edit page as well, but in zh.wp, sr.wp, and kk.wp currently it does not. I'm guessing (could be wrong) that it would eat up a lot more resources. Mark skype: node.ue On Thu, Sep 10, 2009 at 11:49 AM, Helder Geovane Gomes de Lima heldergeov...@gmail.com wrote: 2009/9/10 Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn ar...@wikimedia.org wrote: The differences between the UK and American varieties of English are not limited just to spelling and vocabulary. Those account for the large majority of the more noticeable differences, however. I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if the table doesn't solves every case, what it solves is sufficiently good... 2009/9/10 Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com Is there any reason nobody's tried adding such support for us/uk English? It would resolve some long-standing tension on enwiki. Would anons have to be given one variant or the other, or would they get untransformed text or what? Does the variant transformation apply to the edit page as well? I have the same questions... Helder ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
2009/9/9 Tim Starling tstarl...@wikimedia.org The language variant system that we have could easily convert between US and UK English. In fact it already does convert between a language pair with a far more complex relationship, that is Simplified and Traditional Chinese. The language conversion system is very simple, it's just a table of translated pairs, where the longest match takes precedence. The translation table in one direction (e.g. UK - US) can be different to the table in the other direction (US - UK). You would not list ize - ise, you would list every word in the dictionary with an -ize ending that can be translated to -ise without controversy. The current software could handle 50k pairs or so without serious performance problems, and it could be extended and optimised to allow millions of pairs if there was a need for that. Hello again! What would be needed in order to use pages like MediaWiki:Conversiontable/pt and MediaWiki:Conversiontable/pt-br at the wikimedia projects in Portuguese for the conversion? Is it easy to have the language conversion enabled? Could we gradually create the conversion tables? Sorry for so many questions... Helder ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote: Seems I'm not the only one who had a completely wrong idea about how variants work. We definitely need more documentation and fame for this system, so its potential doesn't go to waste. I theoretically knew that it was just a string-replace system, but it didn't occur to me that it would be useful for more than transliteration. It makes sense now that Tim pointed that out. How would it handle word breaks, though? It would just ignore them, so color - colour also changes uncolored - uncoloured? What about things like HTML id's or even attribute/property names (span style=color:red)? I'm sure I could dig through the code to find the answers to these, but actually I'm not even sure offhand where the code *is*. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Aryeh Gregor wrote: On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote: Seems I'm not the only one who had a completely wrong idea about how variants work. We definitely need more documentation and fame for this system, so its potential doesn't go to waste. I theoretically knew that it was just a string-replace system, but it didn't occur to me that it would be useful for more than transliteration. It makes sense now that Tim pointed that out. How would it handle word breaks, though? It would just ignore them, so color - colour also changes uncolored - uncoloured? Neither of the implementations so far has required any knowledge of word breaks, and so it has not been implemented. In theory you could just list every larger word that contains a smaller transformed word, e.g. humor - humour humorous - humorous But it might be better to just add a word segmentation feature. What about things like HTML id's or even attribute/property names (span style=color:red)? I'm sure I could dig through the code to find the answers to these, but actually I'm not even sure offhand where the code *is*. languages/LanguageConverter.php. There are some rather inelegant regexes to deal with cases like these, they seem to work. The converter operates at a near-HTML stage of the parser, so it's not too hard to skip attributes. Note that the FastStringSearch extension is important for acheiving good performance, especially in Chinese. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Hello! I think the code is these: http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00018 http://svn.wikimedia.org/doc/LanguageZh_8php-source.html#l9 and a comment at http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00258 says: 00271 /* we convert everything except: 00272 1. html markups (anything between and ) 00273 2. html entities 00274 3. place holders created by the parser 00275 */ So, I don't think it will convert span style=color:red. But I'm not sure, because I'm still learning php... By the way, I can't understand Chinese, but (after using an on-line translator) I think the page they have for documenting the system is this: http://zh.wikipedia.org/wiki/Help:%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E7%9A%84%E7%B9%81%E7%AE%80%E5%A4%84%E7%90%86 Helder 2009/9/10 Aryeh Gregor simetrical+wikil...@gmail.com On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw roan.katt...@gmail.com wrote: Seems I'm not the only one who had a completely wrong idea about how variants work. We definitely need more documentation and fame for this system, so its potential doesn't go to waste. I theoretically knew that it was just a string-replace system, but it didn't occur to me that it would be useful for more than transliteration. It makes sense now that Tim pointed that out. How would it handle word breaks, though? It would just ignore them, so color - colour also changes uncolored - uncoloured? What about things like HTML id's or even attribute/property names (span style=color:red)? I'm sure I could dig through the code to find the answers to these, but actually I'm not even sure offhand where the code *is*. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Ariel T. Glenn wrote: The differences between the UK and American varieties of English are not limited just to spelling and vocabulary. Note that the -{...}- structure is available in wikitext to translate article-specific fragments of text, so you can also translate worldview: A popular game played with a bat and ball is -{en-gb:Cricket; en-us:Baseball}-. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
2009/9/9 Helder Geovane Gomes de Lima heldergeov...@gmail.com: Hello! I noticed at sr.wikipedia there is an option Variant under Internationalization at the preferences. How is that different from the 'sr', 'sr-ec' and 'sr-el' which are shown at Language option (also under Internationalization)? I'm interested in this because there are some differences between Brazilian Portuguese ('pt-br') and Portuguese of Portugal ('pt') which usually cause troubles for the admins at the Portuguese projects, who needs to warn the users not to change the wording of the texts from one variant to another (this usually happens, mainly from anonymous contributions), because some differences between the variants seems to be [at a first glance] a typo, and they want to correct it... sr-ec and sr-el refer to the Latin and Cyrillic variants of Serbian (not sure which is which), and AFAIK the software can convert everything, even article text, because the conversion rules are so simple that a computer can execute them. Basically, sr-ec and sr-el have the same text in the same language, but in different alphabets. (This is my understanding, which may be completely wrong; in that case, please correct me.) The difference between pt and pt-br are more delicate than that, and the two can't be autoconverted between by a computer, because of differences in spelling word usage and grammar(?). So, I would like to know if there is currently any feature which could help us to avoid the problem of having a divided community of users ('pt' x 'pt-br') fighting with each other ad infinitum... (and to avoid proposals like that [1] of a new Brazilian Wikipedia, which IMHO will not have any good result, and is not the better way of solving the problem...) No. We already offer users the choice between having the interface in pt or pt-br (or any other language, really), but such a choice doesn't exist for the content. I found [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipediadiff=14163oldid=13621 a comment] about the existence of on-the-fly translation for some languages (Chinese and Serbian), but I don't know how it works, and if it solves or improve the situation. That's the alphabet variant thing I mentioned earlier. If the majority of the differences between pt and pt-br can be summed up with simple rules that a computer can handle, we might be able to work something out. However, that's usually not the case; I don't know Portugese, but I do know that handling even simple differences between en-us and en-gb is too complex already: a system that would successfully convert 'realise' to 'realize' may also try to wrongfully convert 'disguise'. And before this I was also thinking of use (a possible enhanced version of) a procedure like this: considering that currently it is possible to show a system message using {{int:MESSAGE}} in the wikitext in a way that the result changes according to the user's language, would it be possible to create new messages at MediaWiki: Namespace just for defining language variants of words which usually appears at the content of the projects? For example, would it be possible to create MediaWiki:WORD/pt-br and MediaWiki:WORD/pt, and use them (with {{int:WORD}}) instead of the actual word variant in wikitext? This isn't likely to be the better solution, but it could be a first step towards a solution... This sounds like it could work, but only if the /langcode trick actually works (I don't know what that depends on) and if there's a relatively small set of words that makes a relatively big difference (otherwise it'd be more trouble than it's worth IMO; but that's up to the community). Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
2009/9/9 Roan Kattouw roan.katt...@gmail.com: 2009/9/9 Helder Geovane Gomes de Lima heldergeov...@gmail.com: So, I would like to know if there is currently any feature which could help us to avoid the problem of having a divided community of users ('pt' x 'pt-br') fighting with each other ad infinitum... (and to avoid proposals like that [1] of a new Brazilian Wikipedia, which IMHO will not have any good result, and is not the better way of solving the problem...) No. We already offer users the choice between having the interface in pt or pt-br (or any other language, really), but such a choice doesn't exist for the content. This is a community issue. Having a single pt:wp is a win because there's more content in one place and it avoids local-POV bias, same as there's one en:wp rather than US-English and Commonwealth-English. So you need a community rule. The rule we have on en:wp is: 1. It doesn't matter. 2. Use the variant spoken in the location, if relevant. 3. Don't change articles from one to the other except per 2. 4. Try not to worry too much about it. 4. is the important step ;-) It should be simple enough to let new users know the rule and not to worry about which variant :-) - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Nice! ;-) Do you think tables like these http://pt.wiktionary.org/wiki/Wikcionário:Versões da língua portuguesa/Tabela http://pt.wikipedia.org/wiki/Wikipedia:Versões da língua portuguesa/tabela could be a start point to a similar conversion system for pt - pt-br? Meanwhile, I was also trying to adapt the Template:LangSwitch from Wikimedia Commons (http://commons.wikimedia.org/wiki/Template:LangSwitch), in order to be able to use the template syntax like this: {{Language variations| pt = word 1| pt-br = word 2}} For this, I've created two pages: * MediaWiki:Lang, with 'pt' * MediaWiki:Lang/pt-br, with 'pt-br' and the template code is essentially: {{#switch:{{int:Lang}} |pt-br={{{pt-br|}}} |pt |#default={{{pt|}}} }} But I wasn't able to create a param default in order we could set which of the variants will be shown by default for anonymous users. It would be good if we could use {{Language variations| default = pt-br | pt = word 1| pt-br = word 2}} to get: (a) word 2, for annonimous users; (b) word 1, for logged users which choose 'pt' in their preferences; (c) word 2, for logged users which choose 'pt-br' in their preferences; The option (a) would be necessary if we don't want to change an existing text from 'pt-br' to 'pt' (for anonymous users) just because we want the logged users to be able to choose the content variant. Is there any way of detect if the reader is logged in with something in the style {{#if: what? | foo| bar}}? (the problem with {{int:Lang}} is that for anonymous users and for users who choose 'pt' the result is the same: 'pt', so I can't distinguish these two cases at the template...) Anyway, I think it would be better to have some kind of an automatized conversion system, even if it doesn't convert all cases ( at least for the words in the tables above it would be useful) Thank you for all, Helder 2009/9/9 Tim Starling tstarl...@wikimedia.org: Roan Kattouw wrote: That's the alphabet variant thing I mentioned earlier. If the majority of the differences between pt and pt-br can be summed up with simple rules that a computer can handle, we might be able to work something out. However, that's usually not the case; I don't know Portugese, but I do know that handling even simple differences between en-us and en-gb is too complex already: a system that would successfully convert 'realise' to 'realize' may also try to wrongfully convert 'disguise'. I don't know why you're writing this nonsense, you obviously haven't looked at the code at all. The language variant system that we have could easily convert between US and UK English. In fact it already does convert between a language pair with a far more complex relationship, that is Simplified and Traditional Chinese. The language conversion system is very simple, it's just a table of translated pairs, where the longest match takes precedence. The translation table in one direction (e.g. UK - US) can be different to the table in the other direction (US - UK). You would not list ize - ise, you would list every word in the dictionary with an -ize ending that can be translated to -ise without controversy. The current software could handle 50k pairs or so without serious performance problems, and it could be extended and optimised to allow millions of pairs if there was a need for that. It's possible to handle any pair of languages which are separated only by vocabulary, and transliteration or spelling. It's only differences in grammar, such as word order, that would give it trouble. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l