pof <MelbourneBeerBaron <at> gmail.com> writes:

> 
> 
> Hi, I was wondering if someone could provide an example how to parse out the
> plain text from a docx using poi 3.5 beta5?
> 
> Cheers, Brett.

I dicsovered it's fairly easy to get all (or most anyway) of the text from a 
DOCX with basic Java libraries. A docx file is just a zip file with a bunch of 
XML files in it. 

I have an example of this I posted in my blog at 
http://www.maxstocker.com/blog.php?en=c6270d6e2bde17ae8c6f9659b3b863773

but the basic steps are

1) open the docx as a ZipFile
2) Get the XML file as the ZipEntry "word/document.xml"
3) Parse the XML document and get all tags named "w:t"
4) Extract content from those tags





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to