I hope that this information is still up to date so forgive me please if it is not.
As far as I am aware, you cannot currently use HWPF to parse docx files. I paid a quick visit to the project page at http://poi.apache.org/hwpf/index.html and found this; "HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It does not support the new Word 2007 .docx file format, which is not OLE2 based." You do have options however. One that I have looked at but never used in anger is docx4j. The projects website is; http://dev.plutext.org/blog/category/docx4j/ Another would be to use the UNO interface to manipulate the OpenOffice application whilst a third could be to write your own parser; the docx file format is zipped xml after all and if all you want to do is get at the raw text, it may be worthwhile looking into this option. Yours Mark B pof wrote: > > Hi, I was wondering if someone could provide an example how to parse out > the plain text from a docx using poi 3.5 beta5? > > Cheers, Brett. > -- View this message in context: http://www.nabble.com/docx-parse-example-tp23976192p23976770.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
