Re: [W2X] docx to dita issues- splitting the document into correct number of dita topics

Lekan Owodunni Thu, 05 Sep 2019 23:51:44 -0700

thanks for the explanation. We also noticed that empty paragraphs are
removed. For example if you generate empty paragraphs between two lines of
text in your word document. Is this by design? Is there a command line
option that tells xmlmind's w2x tool to keep empty paragraphs in the output
topic files?




On Thu, 5 Sep 2019 at 08:47, Hussein Shafie <[email protected]> wrote:

> On 09/04/2019 12:14 PM, Lekan Owodunni wrote:
> > We have a sample text docx file with 3 headings (all at outline level1).
> > Please see attachement. Using your w2x tool to convert our sample docx
> > to dita, we got 2 dita topics instead of the expected 3 dita topics.
> > The third section (by heading level) seems to have been merged with the
> > second section to produce only 2 dita topic files instead of the
> > expected 3 topic files. Is this a bug?
> >
>
>
> This is not really a bug. Anyway, we don't see what we could fix here.
>
> Here's what happens. Your part of testDoc.docx starting with Heading 1
> "TABLE OF CONTENTS" contains just MS-Word generated TOC object:
> ---
> Heading 1 "INTRO CHAPTER"
> ...Some contents...
> Heading 1 "TABLE OF CONTENTS"
> MS-Word generated TOC object
> Heading 1 "ANOTHER CHAPTER"
> ...Some contents...
> ---
>
> When generating semantic XML, MS-Word generated objects like TOC and
> Index are automatically removed. So we end up with this:
> ---
> Heading 1 "TABLE OF CONTENTS"
> Heading 1 "ANOTHER CHAPTER"
> ---
>
> When two Heading_N immediately follow each other, the second one
> "looses" its outline level_N, hence XMLmind Word To XML generates just 2
> topic files for your testDoc.docx.
>
> Just to prove this, I've added a paragraph just after Heading 1 "TABLE
> OF CONTENTS" and I got the expected 3 topic files.
>
> See attached testDoc_NON_EMPTY_TOC.docx and testDoc_NON_EMPTY_TOC.ditamap.
>
>
>
> --> So now what to do now? The answer is that you are expected to adapt
> XMLmind Word To XML to the specificities of your DOCX documents.
>
> If many of your DOCX documents contain:
> ---
> Heading 1 "TABLE OF CONTENTS"
> MS-Word generated TOC object
> ---
>
> then you are expected to write a XED script which automatically removes:
> ---
> Heading 1 "TABLE OF CONTENTS"
> ---
> (which is anyway useless when generating DITA)
>
> See "Going further with w2x",
> https://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#going_further
>
> See "The XED scripting language",
> https://www.xmlmind.com/w2x/_distrib/doc/xedscript/index.html
>
>
>

-- 


Scribestar Limited is a
company registered in England and Wales. 
Registered number: 06935972.
Registered office: Suite 202, Central Point, 
45 Beech Street, London, EC2Y 8AD. This message is
private and 
confidential. If you have received this message in
error please remove it 
from your system.  You must not disclose, copy
or distribute the contents 
of this email or any attachments to any other person
nor use its contents 
or the content in any attachments in any way or you may be
acting 
unlawfully.

Empty paragraph test.docx
Description: MS-Word 2007 document

--
XMLmind Word To XML Support List
[email protected]
https://www.xmlmind.com/mailman/listinfo/w2x-support

Re: [W2X] docx to dita issues- splitting the document into correct number of dita topics

Reply via email to