I've created a program to read .doc and .docx text. I now want to search and
replace all newline characters (the ones created with shift+enter in Word)
with the following: "<br>" For some reason, however, newline characters
aren't being read properly in HWPF and XWPF.

I use the following to read .doc:

WordExtractor wx = new WordExtractor(document);
String docText = wx.getText();

I use the following to read .docx:

XWPFWordExtractor wx = new XWPFWordExtractor(document);
String docxText = wx.getText();

Let's say I'm reading a Word document formatted as follows:

Bojo<br>the clown<p>Funny

(assume, instead of <br>, in Word I use the shift+enter line feed/new line,
and instead of <p>, in Word I use the regular enter carriage return)

Using HWPF, docText will print (using System.out.println): 
Bojo the clown
Funny

Using XWPF, docxText will print:
Bojothe clown
Funny

Notice how neither HWPF nor XWPF show the "shift+enter" return, but both
reflect the normal "enter" return. Also notice that XWPF doesn't even show
the empty space for the "shift+enter" return, unlike HWPF, which at least
shows a whitespace character.

What is going on? Why can't I display the "shift+enter" character?


-- 
View this message in context: 
http://apache-poi.1045710.n5.nabble.com/HWPF-and-XWPF-How-to-read-newline-tp3323805p3323805.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to