T. H. wrote:
Please find a Word file here: https://XXX/YYY/ZZZ
When I run W2X with the file and these parameters
-o bookmap
-p convert.set-lang ja-JP
-p split.use-id-as-filename true
-p transform2.section-depth 4
I expect W2X to pick up these bookmarks of mine
topic_2_1_2
topic_2_2
topic_2_2_4
but W2X picks up these ID respectively
_F_XXX_F_YYY
OLE_LINK13
OLE_LINK16
which were not set by me.
Per your suggestion, I tried
-p edit.ids.automatic-ids
"(^_?[a-zA-Z]{1,3}\d{3,}$)|(^(OLE_LINK|_ENREF_))|(^_GoBack$)|(^_F_)|(^OLE_LINK\d+$)"
(Well, "(^OLE_LINK\d+$)" seems to be covered by the default
"(^(OLE_LINK|_ENREF_))", but I added it just in case.)
and then W2K picked up these
d0e42
OLE_LINK13
OLE_LINK16
So edit.ids.automatic-ids didn't much help.
This kind of problem should be solved in forthcoming XMLmind Word To XML
v1.8.
Excerpts from Change History:
---
Enhancements:
...
* XMLmind Word To XML is better at choosing user-specified bookmarks
(expected to have long and descriptive names like "Edit_a_citation")
over bookmarks automatically generated by MS-Word (e.g. "BM3" ).
This may have important benefits when converting a DOCX file to multiple
semantic XML files (e.g. a DITA map and its associated topics) because
in such case, the names of the generated files are generally inferred
from user-specified bookmarks.
In order to implement this enhancement, we had to replace parameter
edit.ids.automatic-ids by new parameter convert.automatic-ids. This has
been done to move the detection of automatic bookmarks at an earlier
stage of the conversion process.
...
---
Therefore, using w2x v1.8, running:
---
w2x -o bookmap \
-p convert.set-lang ja-JP \
-p transform2.section-depth 4 \
-p convert.automatic-ids "|(^_F_)" \
test.docx test.ditamap
---
should give you the result you expect.
Notes:
* '-p split.use-id-as-filename true' is not useful when generating DITA.
* '-p convert.automatic-ids "|(^_F_)"' specifies just a *partial*
regular expression, which, because its starts with "|", is *appended* to
the default regular expression (which is
"(^_?[a-zA-Z]{1,3}\\d+$)|(^(OLE_LINK|_ENREF_))|(^_GoBack$)").
--
XMLmind Word To XML Support List
[email protected]
https://www.xmlmind.com/mailman/listinfo/w2x-support