Prince J. wrote:

I am currently evaluating different libraries to automate a process of
converting some English docx files to docbook xml (to import into our
CMS), before making a recommendation on purchase.


I'll assume here that you want to create a DocBook v5 article.





I tried XMLmind w2x online conversion and have also downloaded and
used the evaluation version of the software on windows. I am getting a
good output from w2x but there are some custom styles that are being
ignored (EULAUnderline, EULABold, EULAUnderlineBold, EULATitle). There
are a couple of other libraries that are capturing these styles, but
they have other limitations that I am working with them on. I would
really appreciate your insight and help on this issue.



Could you please take a look at the attached input docx and the output
docbook xml, and check if it is a known issue with w2x or if there is
something wrong with the input doc? Please let me know if you need any
more information. Looking forward to hear from you.


Your DOCX is fine.

There are no known issues related to the support of styles like EULAUnderline, EULABold, EULAUnderlineBold, EULATitle.

Simply, out of the box, w2x does not know anything about the above styles and needs to be configured to process them for a near optimal result.

I've used w2x-app.exe, the desktop app (http://www.xmlmind.com/w2x/_distrib/doc/w2x_app_help/index.html) for that and of course, its "wizard" (http://www.xmlmind.com/w2x/_distrib/doc/w2x_app_help/options_wizard.html).

1) I mapped character styles EULAUnderline, EULABold, EULAUnderlineBold to <emphasis role="RRR"> (though I could also have mapped EULAUnderlineBold to DocBook <email> element). See attached screenshot map_char_styles.png.

2) I declared paragraph style EULATitle as being the style of a *title*, (similar to MS-Word stock "Title" style) using parameter "edit.title.title-style-names" (http://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#param_title_style_names). See attached screenshot declare_title_style.png.

The wizard then generated for me attached ".options" file (with its companion XXX_LICENSE_AGREEMENT_scripts/custom_transform.xslt ).

Using it on your "XXX_LICENSE_AGREEMENT.docx" gave me attached ".xml" DocBook file.



# AUTOMATICALLY CREATED BY w2x-app SETUP ASSISTANT. PLEASE DO NOT EDIT BY HAND!
###outputFormat docbook5
###option transform.hierarchy-name article
###option transform.cals-tables yes
###styleMapping CHARACTER_STYLE c-EULABold emphasis role="bold"
###styleMapping CHARACTER_STYLE c-EULAUnderline emphasis role="underline"
###styleMapping CHARACTER_STYLE c-EULAUnderlineBold emphasis role="underline"
###otherParameter -p edit.title.title-style-names p-EULATitle

-o docbook5
-p edit.title.title-style-names p-EULATitle
-p transform.hierarchy-name article
-p convert.set-column-number yes -p transform.cals-tables yes
-p edit.inlines.convert "c-EULABold span class='c-EULABold' ! c-EULAUnderline 
span class='c-EULAUnderline' ! c-EULAUnderlineBold span 
class='c-EULAUnderlineBold'"
-t XXX_LICENSE_AGREEMENT_scripts/custom_transform.xslt

Attachment: custom_transform.xslt
Description: application/xslt

--
XMLmind Word To XML Support List
w2x-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/w2x-support

Reply via email to