On Tue, 10 Jan 2012, Andrei Khveras wrote:
I'm trying to use the class org.apache.poi.hwpf.extractor.WordExtractor, what I downloaded as a part of Apache POI <http://poi.apache.org/download.html>.

*Could somebody, please*, kindly help me to resolve this little issue. My goal is to get MS Word file contents as one single String, containing all control characters. I need it for further (hand-made!) splitting text into paragraphs, words, etc.

Why not fetch the paragraphs directly then? That'd give you full control over which bit of text is in which paragraph, and will let you decide if you want to display or hide control characters etc

I'd suggest you look at the code for WordExtractor to get an idea of how to go about doing it, then do your own version that implements your required logic

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to