Arthur Reutenauer wrote:
Mojca,I'm still not getting the algorithm you suggest. In particular:I need a definition for command \def\mycommand#1#2{...} that I could call as % A3 is code of ccaron in EC % č is two tokens: ^^c4^^8d \mycommand{č}{^^a3} % no idea what is Tau in greek encoding (don't care) % but it's only a single token \mycommand{Τ}{^^ff} The pseudocode: - test if #1 is one or two tokens (use the same trick as Taco suggested) - if it's interpreted as two tokens, ignore - if it's interpreted as one token (like Tau), make that letter \active and define it to generate #2That won't be enough. Because, if I undertand Z. R.'s explanations correctly, you could have the following situation: (Assuming pTeX is in EUC-JP mode) 1. The input is “ši” (U+0161, U+0069). It's reencoded as 0xB2, 0x69 in the EC font encoding, which is not a valid EUC-JP code, hence the first byte is interpreted as a character, and so is the second byte. 2. The input is “šč” (U+0161, U+010D). It's reencoded as 0xB2, 0xA3 in EC, which *is* a valid EUC-JP code (corresponding to Unicode character U+6A2A, as it is), hence that two-character sequences is interpreted as a single Japanese character, and the original input is simply lost. I don't see how we could solve the situation by considering each character individually (like we currently do in UTF-8), given pTeX's behaviour.
I also read that explanation (but not very thoroughly). My impression was: ptex understands any kind of input as long as it is a valid Japanese character, and produces random 8-bit stuff otherwise (which could coincide with an 8-bit font encoding for western europe, but only if you are both careful and lucky). I cannot imagine how it would be possible to work around those input restrictions dynamically. Best wishes, Taco
