Okay, so that begs the question, if there is no difference between UTF8 and ASCII, why make the distinction? I mean, what would be the point to converting from ASCII to UTF8 or vis versa if the results were always the same?
Just being practical. Bob On Oct 6, 2010, at 1:29 PM, Jeff Massung wrote: > On Wed, Oct 6, 2010 at 3:23 PM, Richard Gaskin > <ambassa...@fourthworld.com>wrote: > >> I have an app that needs to auto-detect Unicode and plain text, and render >> them correctly based on that auto-detection. >> >> I have the UTF16 stuff working, but with UTF8 I have a problem: there is >> no BOM to let me know if it's Unicode, and some plain text files will >> occasionally have high-ASCII values in them (like the dagger symbol). >> >> What patterns should I be looking for in the binary data of a file to >> distinguish UTF8 from plain text? >> >> > Sorry, Richard, but I believe you are out of luck here. The idea behind UTF8 > is that it's indistinguishable from ASCII (0-127). You may be able to scan > the files, and if they are large enough, try and deduce some thing from them > to know which they are. For example: > > On Windows, "\r\n" (13, 10) should terminate lines. Could very well be a > text file. > > In ASCII there will never be a NULL terminator anywhere (byte 0). There's > likely many 0-byte values in any appreciably large Unicode file. This would > also be true of byte 8 (backspace) and byte 7 (the bell) and probably a few > others. > > If the number of bytes that have the high bit (0x80) set is extremely low > (<<< 1%) then most likely it's ASCII. > > HTH, > > Jeff M. > _______________________________________________ > use-revolution mailing list > use-revolution@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-revolution _______________________________________________ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution