Damian C. wrote:

I hope this email finds you well. We've been managing to resolve most of
our import problems locally but there is one that has us scratching our
heads.

If we convert the attached file to a dita map we get all of the content
except for these lines:

Part I
Part II
Part III

I can see that the numerical parts of these titles are sequence fields
but I'm surprised that the first words are not extracted into the dita
output.

Do you know why this content is being excluded?


Yes.

It's because in your DOCX file, the stock MS-Word style "Caption" is used in an usual context. In your DOCX file, the "Caption" style is used for *section* *headings*. Normally the "Caption" style is expected to be used just before or just after tables, figures and equations.

It's the following XED code:
---
macro deleteCaptionLabel() {
    variable("seq", ./span[get-class("role-field") and
                           ./processing-instruction("field")[
starts-with(., "SEQ")]]);

    if $seq {
        for-each $seq/preceding-sibling::node() {
            delete();
        }

        delete($seq);
    }

    (: Get rid of ": " or ". " in "Figure NN: " or "Figure NN. ". :)
    delete-text("^\s*([\:\.]\s*)?");
}
---
found in <W2X_install_dir>/xed/captions.xed, which deletes all nodes before the <span> containing SEQ field (see above "for-each $seq/preceding-sibling::node() {...}").

The above code typically deletes useless content like "Figure A : " or "Table 4.:".

w2x always tries very hard to remove all the content automatically generated by MS-Word. This automatically generated is expected to be re-created, differently, if needed to, by the DITA processor which will convert the DITA files generated by w2x to other formats.

I would not recommend to comment out the call to macro deleteCaptionLabel(), that is (see "Comments" in "The XED scripting language", https://www.xmlmind.com/w2x/_distrib/doc/xedscript/xed_comments.html),

(: deleteCaptionLabel(); :)

because the above code really makes sense when the "Caption" style is used normally.

Instead please consider modifying your DOCX file using MS-Word and restyling your section headings using "Heading 1", "Heading 2", etc.

--
XMLmind Word To XML Support List
[email protected]
https://www.xmlmind.com/mailman/listinfo/w2x-support

Reply via email to