On 09/04/2019 12:14 PM, Lekan Owodunni wrote:
We have a sample text docx file with 3 headings (all at outline level1). Please see attachement. Using your w2x tool to convert our sample docx to dita, we got 2 dita topics instead of the expected 3 dita topics. The third section (by heading level) seems to have been merged with the second section to produce only 2 dita topic files instead of the expected 3 topic files. Is this a bug?
This is not really a bug. Anyway, we don't see what we could fix here.Here's what happens. Your part of testDoc.docx starting with Heading 1 "TABLE OF CONTENTS" contains just MS-Word generated TOC object:
--- Heading 1 "INTRO CHAPTER" ...Some contents... Heading 1 "TABLE OF CONTENTS" MS-Word generated TOC object Heading 1 "ANOTHER CHAPTER" ...Some contents... ---When generating semantic XML, MS-Word generated objects like TOC and Index are automatically removed. So we end up with this:
--- Heading 1 "TABLE OF CONTENTS" Heading 1 "ANOTHER CHAPTER" ---When two Heading_N immediately follow each other, the second one "looses" its outline level_N, hence XMLmind Word To XML generates just 2 topic files for your testDoc.docx.
Just to prove this, I've added a paragraph just after Heading 1 "TABLE OF CONTENTS" and I got the expected 3 topic files.
See attached testDoc_NON_EMPTY_TOC.docx and testDoc_NON_EMPTY_TOC.ditamap.--> So now what to do now? The answer is that you are expected to adapt XMLmind Word To XML to the specificities of your DOCX documents.
If many of your DOCX documents contain: --- Heading 1 "TABLE OF CONTENTS" MS-Word generated TOC object --- then you are expected to write a XED script which automatically removes: --- Heading 1 "TABLE OF CONTENTS" --- (which is anyway useless when generating DITA)See "Going further with w2x", https://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#going_further
See "The XED scripting language", https://www.xmlmind.com/w2x/_distrib/doc/xedscript/index.html
testDoc_NON_EMPTY_TOC.docx
Description: MS-Word 2007 document
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd"> <map xml:lang="en-GB"><title>???</title><topicmeta><author>Christian Booth</author><publisher>ScribeStar Ltd</publisher><critdates><created date="2019-09-05T07:26:00Z"/><revised modified="2019-09-05T07:26:00Z"/></critdates></topicmeta><topicref href="testDoc_NON_EMPTY_TOC_files/d0e11.dita"/><topicref href="testDoc_NON_EMPTY_TOC_files/d0e16.dita"/><topicref href="testDoc_NON_EMPTY_TOC_files/d0e21.dita"/></map>
-- XMLmind Word To XML Support List w2x-support@xmlmind.com https://www.xmlmind.com/mailman/listinfo/w2x-support