This is not a question, but I felt the need to type a few lines for
other people who might want to work with bullet points or numberings
in .docx files using the capabilities of Apache POI 3.8.

My original task can be described as extracting certain information
from a specifically formatted word file, and outputting that
information in a simple xml document format. Bullet points and
numberings aren't only specially formatted text, but also implicitly
hold the information about their 'level' compared to other bullet
points or numberings, and I was interested in extracting this
information and to discerning both from each other.

With the great pointers and solutions provided by Mark Beardsley
(again a big thanks for that!) I was able to find out if a paragraph
is part of a numbering/bullet point by checking if
currentParagraph.getDocument().getNumbering().getNum(currentParagraph.getNumID())
returns something different than null. Now, as bullet points are a
special kind of numbering (or the other way around), I still had to
find a way to discern them from each other. Sadly, Apache POI didn't
seem to provide something for that yet, so I extracted this
information manually from the right fragment of the original docx-xml.
Please note that I'm pretty new to java, Apache POI, etc, so this is
probably very ugly for better programmers ;-) :

//First we need the ID of the numbering
BigInteger currentParagraphNumberingID=
currentParagraph.getCTP().getPPr().getNumPr().getNumId().getVal();
//, then use it to find the abstract ID
BigInteger currentParagraphAbstractNumID =
currentParagraph.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID);
// and use that one to find the right xml fragment object and turn it
into a string
XWPFAbstractNum currentParagraphAbstractNum =
currentParagraphNumbering.getAbstractNum(currentParagraphAbstractNumID);
CTAbstractNum currentParagraphAbstractNumFormatting =
currentParagraphAbstractNum.getCTAbstractNum();
String numberingFormatXmlAsString =
currentParagraphAbstractNumFormatting.toString();

>From that I built an xml document with the namespace
"http://schemas.openxmlformats.org/wordprocessingml/2006/main";, get
its first lvl-element, then the first numFmt-element of the
lvl-element, and from that we extract the value of the "val"
attribute. If that value is "decimal", "lowerLetter", etc, we are
dealing with a numbering, and if not, it is probably a bullet point.
Obviously this method will falsely report a bullet point if we forget
to check for certain values that might indicate a numbering (or
something different than a bullet point), but in my case this sufficed
because we only allow standard numberings in our specially formatted
word file.

In hope that this might help somebody

Andreas Seeg

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to