T. H. wrote:
Please find a Word file here: https://XXX/YYY/ZZZ

When I run W2X with the file and these parameters

    -o bookmap
    -p convert.set-lang            ja-JP
    -p split.use-id-as-filename    true
    -p transform2.section-depth    4

I expect W2X to pick up these bookmarks of mine

    topic_2_1_2
    topic_2_2
    topic_2_2_4

but W2X picks up these ID respectively

    _F_XXX_F_YYY
    OLE_LINK13
    OLE_LINK16

which were not set by me.

Per your suggestion, I tried

    -p edit.ids.automatic-ids 
"(^_?[a-zA-Z]{1,3}\d{3,}$)|(^(OLE_LINK|_ENREF_))|(^_GoBack$)|(^_F_)|(^OLE_LINK\d+$)"

(Well, "(^OLE_LINK\d+$)" seems to be covered by the default 
"(^(OLE_LINK|_ENREF_))", but I added it just in case.)

and then W2K picked up these

    d0e42
    OLE_LINK13
    OLE_LINK16

So edit.ids.automatic-ids didn't much help.



This kind of problem should be solved in forthcoming XMLmind Word To XML v1.8.

Excerpts from Change History:
---
Enhancements:
...

* XMLmind Word To XML is better at choosing user-specified bookmarks (expected to have long and descriptive names like "Edit_a_citation") over bookmarks automatically generated by MS-Word (e.g. "BM3" ).

This may have important benefits when converting a DOCX file to multiple semantic XML files (e.g. a DITA map and its associated topics) because in such case, the names of the generated files are generally inferred from user-specified bookmarks.

In order to implement this enhancement, we had to replace parameter edit.ids.automatic-ids by new parameter convert.automatic-ids. This has been done to move the detection of automatic bookmarks at an earlier stage of the conversion process.

...
---

Therefore, using w2x v1.8, running:

---
w2x -o bookmap \
  -p convert.set-lang ja-JP  \
  -p transform2.section-depth 4 \
  -p convert.automatic-ids "|(^_F_)" \
  test.docx test.ditamap
---

should give you the result you expect.

Notes:

* '-p split.use-id-as-filename true' is not useful when generating DITA.

* '-p convert.automatic-ids "|(^_F_)"' specifies just a *partial* regular expression, which, because its starts with "|", is *appended* to the default regular expression (which is "(^_?[a-zA-Z]{1,3}\\d+$)|(^(OLE_LINK|_ENREF_))|(^_GoBack$)").



--
XMLmind Word To XML Support List
[email protected]
https://www.xmlmind.com/mailman/listinfo/w2x-support

Reply via email to