On Tue, Dec 10, 2013 at 6:09 AM, Zdenek Wagner <zdenek.wag...@gmail.com> wrote: > 2013/12/10 Keith J. Schultz <keithjschu...@web.de>: >> I will repeat I do not know Vietnamese so I can not give you [...] >> Now, if "sang" is true Vietnamese and not a latinized form stand corrected! >> Though I have [...] > Yes, it is true Vietnamese word. I do not know Vietnamese, I could
https://www.google.com/search?q=sang+site%3Avi.wikipedia.org ..which is indeed the issue I am attempting to deal with (trying to put the discussion back on track) -- a bunch of user authored content which looks correct to a native speaker when using the unicode bidi algorithm (implemented in the browser). Language tags are only applied sporadically when needed to correct some obvious issue -- although the future Visual Editor project at wikimedia hopes to make language tagging a more integrated part of the editing process. Language tagging uses the HTML <span lang="...." dir="...."> standard. Directionality tagging uses <bdo> and <bdi> where necessary. But again, the point of the bidi algorithm is to avoid the necessity of manual tagging in many cases. Ultimately, wikipedias goal is to allow the largest number of individual authors the ability to create encyclopedic content in their language as easily as possible. Our greatest challenge is the "as easily as possible" part. We can't impose language tagging as a barrier to entry, when it is not necessary for the author's text to be readable and useful to the public. We can encourage it in order to obtain good hyphentation of embedded texts, but in our case that must be an optional enhancement, not a requirement in order for the text to be read. (Which is why if we did do automated language guessing, it would likely be primarily to *disable* hyphenation when we detect an embedded text whose language differs from the one currently selected. That is the safe option; we'll sacrifice some beauty but preserve the legibility of the text -- which is our foremost concern. We can't use automated language guessing to second-guess the unicode bidi algorithm, because the text *as it appears in the browser* is the text which has been proof-read by our editors, and must be considered canonically correct.) --scott -- ( http://cscott.net/ ) -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex