Hi guys,

Wow, where to start? I completely understand your confusion and feel your pain. When I started with unicode I felt lost, too. Here's a short list of Aha! points that should help. (Caveat--I'm describing my understanding purely from a developer perspective; I have little understanding about how Rev implements unicode "under the hood").

- When we talk about unicode in Rev, we're talking about UTF-16, not UTF-8 or UTF-32.

- The current implementation of unicode is not perfect, but it is perfectly usable. (Right-to-left languages are still problematic, especially if you need to support user input. Display of same is usually fine.)

- The useUnicode property has very limited application. It only affects the behavior of the charToNum and numToChar functions. If useUnicode is false, these 2 functions behave as we're accustomed; if true, these 2 functions assume two byte characters instead of 1 byte.

- The byte order in which unicode files are stored is dependent upon the processor in the host machine. That means that if you're transferring unicode files from, say, a PPC-based machine to an Intel- based one, UTF-16 files will be scrambled unless you invert the bytes as you read them in.

- In light of the above, it's usually best to store unicode text as UTF-8 or even htmlText. These have been the most reliable transfer formats for me.

- In a Rev field unicode and ascii get mixed up all the time. For instance, characters that normally fall within the ascii range, like space, return and common punctuation, are considered ascii. While this can be confusing, it does ensure that normal Rev chunk expressions work as expected.

- There is no 100% reliable way I know of to look at a file and determine heuristically whether it's unicode, or what flavor of unicode it is.

- The section on unicode in the Rev User Guide (section 6.4) is pretty good as far as it goes, but doesn't cover all the "gotchas".

- Dealing with unicode in text fields is different that in buttons and menus.

Anyhow, those are some of the key points. For a more in depth discussion, see my Unicode presentation from RevLive if you've got the DVD. Failing that, you're welcome to read my presentation notes at:

http://asay.byu.edu/revUnicode.pdf

The stack I used in that presentation, which shows lots of examples, is at:

go url "http://asay.byu.edu/unicode-RevLive08.rev";

I'm happy to help if you still have specific issues after you look at this stuff. Unicode is doable, once you learn the tricks and pitfalls.

Regards,

Devin


On Nov 24, 2008, at 6:45 PM, Scott Rossi wrote:

Recently, Phil Davis wrote:

Thanks for asking the questions, Scott. I'm interested in clarity here too since I'll be working with Arabic again in the next few months, and
am still a Unicode lightweight.

You want questions?  I got a truck-load of 'em...

For instance... I have characters from several languages in the text I'm working with: Roman, French (accented), Chinese, and Russian. When I set
the unicodeText of a field to the text, the accented French characters
render incorrectly.  Looking in the source text file, it appears the
original French characters may have been reformatted when saving the file as UTF-16. Is there any way to keep the French characters intact within the
unicode text?

Thanks & Regards,

Scott Rossi
Creative Director
Tactile Media, Multimedia & Design


_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to