On Wed, Aug 18, 2010 at 01:29:14PM -0500, Chanoch (Ken) Bloom wrote: > Consider writing a SAX filter that just drops the offending <font> and > </font>.
Well, we want the style info to remain... there's just no reason in the world for the document to specify it over and over again on a per-word or per-character(!) basis. :) > Also consider using XPath, like my following example in Ruby (using the > Nokogiri XML library) Ooooh. Thanks, I'll poke at this. (I know there's some some Xpath stuff in PHP that I know nothing about, since I've only spoken to it about XML via its DOMDocument stuff, so far.) Thanks, -bill! _______________________________________________ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech