Re: [W2X] How to preserve characters in the unicode private range like 0xF072?

Hussein Shafie Mon, 13 Feb 2017 01:02:53 -0800

Yasser S. wrote:

Thanks Hussein. That worked and I bought a license.
I have a question about unicode characters: The documents I'm working on have 
characters in the unicode private range like 0xF072; which becomes u when it 
goes through w2x. Is there a way to preserve these characters all the way to 
the output XHML?




Yasser S. wrote:

I hacked a solution and would like your opinion on it.
I added a xed script in main.xed in the before.after-translate step.


I thought you wanted to generate styled XHTML and not semantic XHTML.

main.xed is for semantic XHTML. main-styled is for styled XHTML. Stepafter-translate is found only in main.xed.

Here's the script:


(:
 : Transform arabic honorifics (ligatures)
 :
 :)

namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

warning("In cordoba ");
for-each /html/body//span[contains(@style, 'AGA Arabesque')]/text() {
   set-variable("honorific", string(.));
   warning("honorific set to ", $honorific);
   if ($honorific = 'u') {
     set-variable("honorific", "&#xF075;");
   } elseif ($honorific =  'r') {
     set-variable("honorific", "&#xF072;");
   } elseif ($honorific = 't') {
     set-variable("honorific", "&#xF074;");
   } else {
     warning("Unknown character ", .);
   }
   warning("Honorific is: ", $honorific);
   replace(<span class="honorific">{$honorific}</span>, ./..);
}

This seems to work and is producing the desired output.


Your script is well-thought, but could be made slightly more general.

Is there a better way to do this?

The following script (I called it "ar1.xed") works whether you generatestyled XHTML or semantic XHTML:


---
(:
 : Transform Arabic honorifics (ligatures)
 :
 :)

namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

(: PITFALL: lookup-style('font-family') returns a QUOTED STRING like
  'AGA Arabesque' or "AGA Arabesque".
  Hence test using "contains()" and not "=". :)

for-each /html/body//span[contains(lookup-style('font-family'),
                                   'AGA Arabesque')] {
   set-variable("honorific", string(.));
   message("Testing honorific ", concat('"', $honorific, '"'));

   if ($honorific = 'u') {
       set-variable("honorific", "&#xF075;");
   } elseif ($honorific = 'r') {
       set-variable("honorific", "&#xF072;");
   } elseif ($honorific = 't') {
       set-variable("honorific", "&#xF074;");
   } else {
       message("Unknown honorific character ",
               concat('"', $honorific, '"'));
       continue();
   }

   message("Setting honorific to ", concat('"', $honorific, '"'));
   replace(<span class="honorific">{$honorific}</span>);
}
---

It is invoked as follows:

-pu edit.after.init-styles ar1.xed

(-pu before ar1.xed is a relative URL, not a plain string.)

The main difference with yours is that it is invoked after stepinit-styles which INTERNS THEN SUPPRESSES the style and classattributes. That is, why I use lookup-style('font-family') and not @style.

Using @style is less efficient. Moreover @style only contains directstyles when lookup-style() performs a full style search, included namedand inherited styles. See "string lookup-style(string, node?)" inhttp://www.xmlmind.com/w2x/_distrib/doc/xedscript/w2xfuncs.html#lookup-style

Another difference with your script is the use of "continue();" in the"for-each" loop.


--
XMLmind Word To XML Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/w2x-support

Re: [W2X] How to preserve characters in the unicode private range like 0xF072?

Reply via email to