Aryeh Gregor wrote: > On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <roan.katt...@gmail.com> wrote: >> Seems I'm not the only one who had a completely wrong idea about how >> variants work. We definitely need more documentation and fame for this >> system, so its potential doesn't go to waste. > > I theoretically knew that it was just a string-replace system, but it > didn't occur to me that it would be useful for more than > transliteration. It makes sense now that Tim pointed that out. How > would it handle word breaks, though? It would just ignore them, so > color -> colour also changes uncolored -> uncoloured?
Neither of the implementations so far has required any knowledge of word breaks, and so it has not been implemented. In theory you could just list every larger word that contains a smaller transformed word, e.g. humor -> humour humorous -> humorous But it might be better to just add a word segmentation feature. > What about > things like HTML id's or even attribute/property names (<span > style="color:red">)? I'm sure I could dig through the code to find > the answers to these, but actually I'm not even sure offhand where the > code *is*. languages/LanguageConverter.php. There are some rather inelegant regexes to deal with cases like these, they seem to work. The converter operates at a near-HTML stage of the parser, so it's not too hard to skip attributes. Note that the FastStringSearch extension is important for acheiving good performance, especially in Chinese. -- Tim Starling _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l