I am the editor of a document [the IEEE 754-2008 standard] that was created
around 15 years ago (using OpenOffice), and has had nearly 200 drafts, a
number of editors, and countless edits. It was last changed in 2008, but is
now about to go though a new revision cycle.

I was delighted to find that LibreOffice handled the 2008 .odt file almost
perfectly, with only 7 errors (all were weird spurious empty reference tags,
of unknown provenance, that OpenOffice quietly ignored).

While identifying and removing those from the content.xml, I noticed that
there are hundreds (possibly thousands) of redundant tags. These are
typically in the context: <span whatever>text1</span><span
whatever>text2</span> where 'whatever' is identical, and either or both
'text1' or 'text2' may be empty.

It there a tool to clean these up? I could write one myself (I recently
wrote an XML parser) but if one already exists ...

Many thanks -- Mike Cowlishaw

[Apologies if this is a duplicate .. I tried it on askLibo some time ago but
it is still "awaiting moderation".]



-- 
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to