At 6:51 AM -0400 10/8/02, John Cowan wrote:
>Marco Cimarosti scripsit:
>
>>  Talking about the format of mapping tables, I always wondered why not using
>>  ranges. In the case of ISO 8859-11, the table would become as compact as
>>  three lines:

In XOM I currently do a quick initial test with if for 0x00 through 
0xA0. This covers the very common case of ASCII very quickly. (The C1 
controls and the non-breaking space are gravy.) The remainder I do 
with a switch statement with one case per value. It's my recollection 
that Java compilers can compile this very efficiently using table 
lookup instructions built into Java's virtual machine. However, array 
lookup might be quicker still. One day I'll have to profile this and 
find out for sure.

The Verifier class has a similar issue, though there it's a case of 
determining whether or not any given character is a legal XML 
character/name character/name-start character/ etc. This is done with 
a trick introduced in JDOM where the code looks like this:

     public static boolean isXMLLetter(char c) {
         // Note that order is very important here.  The search proceeds
         // from lowest to highest values, so that no searching occurs
         // above the character's value.  BTW, the first line is equivalent to:
         // if (c >= 0x0041 && c <= 0x005A) return true;

         if (c < 0x0041) return false;  if (c <= 0x005a) return true;
         if (c < 0x0061) return false;  if (c <= 0x007A) return true;
         if (c < 0x00C0) return false;  if (c <= 0x00D6) return true;
         if (c < 0x00D8) return false;  if (c <= 0x00F6) return true;
         if (c < 0x00F8) return false;  if (c <= 0x00FF) return true;
         if (c < 0x0100) return false;  if (c <= 0x0131) return true;
         if (c < 0x0134) return false;  if (c <= 0x013E) return true;

This means ASCII and Latin-1 are pretty quick, but the further you go 
into Unicode the more checks have to be made.

This almost certainly could be sped up with a table lookup, at the 
cost of carrying around a few static 65,536 element boolean arrays. 
(Anyone happen to know if Java uses one-byte per boolean in arrays or 
not?)


-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+

Reply via email to