Jon Hanna wrote: > False positives can be caused by the use of U+0000 (which is > most often encoded as 0x00) which some applications do use > in text files.
I have never seen such a thing, can you make an example? I can't imagine any use for a NULL in a file apart terminating records or strings but, of course, a file containing records or string is not what I would call a "plain-text file", anyway not a "typical" plain-text file. > The method can be used reliably with text files that are > guaranteed to contain large amounts of Latin-1 But the Latin-1 (or even just ASCII) range contains some characters which are shared by most languages (space, new line and/or line feed, digits, punctuation), so there should be a relatively large amount of Latin-1 characters in most cases. Even scripts which have their own digits or punctuation often prefer European digits punctuation, especially in computer usage. E.g., it suffices to check a few websites (or even printed matter) in Arabic to see that European digits are much more widespread than native digits. _ Marco

