Re: Unicode from Variable?

Devin Asay Tue, 25 Nov 2008 07:21:10 -0800

Hi guys,

Wow, where to start? I completely understand your confusion and feelyour pain. When I started with unicode I felt lost, too. Here's ashort list of Aha! points that should help. (Caveat--I'm describing myunderstanding purely from a developer perspective; I have littleunderstanding about how Rev implements unicode "under the hood").

- When we talk about unicode in Rev, we're talking about UTF-16, notUTF-8 or UTF-32.

- The current implementation of unicode is not perfect, but it isperfectly usable. (Right-to-left languages are still problematic,especially if you need to support user input. Display of same isusually fine.)

- The useUnicode property has very limited application. It onlyaffects the behavior of the charToNum and numToChar functions. IfuseUnicode is false, these 2 functions behave as we're accustomed; iftrue, these 2 functions assume two byte characters instead of 1 byte.

- The byte order in which unicode files are stored is dependent uponthe processor in the host machine. That means that if you'retransferring unicode files from, say, a PPC-based machine to an Intel-based one, UTF-16 files will be scrambled unless you invert the bytesas you read them in.

- In light of the above, it's usually best to store unicode text asUTF-8 or even htmlText. These have been the most reliable transferformats for me.

- In a Rev field unicode and ascii get mixed up all the time. Forinstance, characters that normally fall within the ascii range, likespace, return and common punctuation, are considered ascii. While thiscan be confusing, it does ensure that normal Rev chunk expressionswork as expected.

- There is no 100% reliable way I know of to look at a file anddetermine heuristically whether it's unicode, or what flavor ofunicode it is.

- The section on unicode in the Rev User Guide (section 6.4) is prettygood as far as it goes, but doesn't cover all the "gotchas".

- Dealing with unicode in text fields is different that in buttons andmenus.

Anyhow, those are some of the key points. For a more in depthdiscussion, see my Unicode presentation from RevLive if you've got theDVD. Failing that, you're welcome to read my presentation notes at:


http://asay.byu.edu/revUnicode.pdf

The stack I used in that presentation, which shows lots of examples,is at:


go url "http://asay.byu.edu/unicode-RevLive08.rev";

I'm happy to help if you still have specific issues after you look atthis stuff. Unicode is doable, once you learn the tricks and pitfalls.


Regards,

Devin


On Nov 24, 2008, at 6:45 PM, Scott Rossi wrote:

Recently, Phil Davis wrote:
Thanks for asking the questions, Scott. I'm interested in clarityheretoo since I'll be working with Arabic again in the next few months,and
am still a Unicode lightweight.
You want questions?  I got a truck-load of 'em...
For instance... I have characters from several languages in thetext I'mworking with: Roman, French (accented), Chinese, and Russian. WhenI set
the unicodeText of a field to the text, the accented French characters
render incorrectly.  Looking in the source text file, it appears the
original French characters may have been reformatted when saving thefile asUTF-16. Is there any way to keep the French characters intactwithin the
unicode text?

Thanks & Regards,

Scott Rossi
Creative Director
Tactile Media, Multimedia & Design


_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Devin Asay
Humanities Technology and Research Support Center
Brigham Young University

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Unicode from Variable?

Reply via email to