I hope that this information is still up to date so forgive me please if it
is not.

As far as I am aware, you cannot currently use HWPF to parse docx files. I
paid a quick visit to the project page at
http://poi.apache.org/hwpf/index.html and found this;

"HWPF is the name of our port of the Microsoft Word 97(-2007) file format to
pure Java. It does not support the new Word 2007 .docx file format, which is
not OLE2 based."

You do have options however. One that I have looked at but never used in
anger is docx4j. The projects website is;

http://dev.plutext.org/blog/category/docx4j/

Another would be to use the UNO interface to manipulate the OpenOffice
application whilst a third could be to write your own parser; the docx file
format is zipped xml after all and if all you want to do is get at the raw
text, it may be worthwhile looking into this option.

Yours

Mark B


pof wrote:
> 
> Hi, I was wondering if someone could provide an example how to parse out
> the plain text from a docx using poi 3.5 beta5?
> 
> Cheers, Brett.
> 

-- 
View this message in context: 
http://www.nabble.com/docx-parse-example-tp23976192p23976770.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to