I was wanting to pull the text from each page and the number of paragraphs is
greater than the number of pages. I need to keep the text for each page
separated, I could not find a good example and was thinking that a paragraph
was the same as a page in POI.
How can I cycle through the pages and get the text per page?
try
{
XWPFDocument docx = new XWPFDocument(new FileInputStream(aFile));
int numOfPages =
docx.getProperties().getExtendedProperties().getUnderlyingProperties().getPages();
String pageText;
String md5Hash;
int searchablePages = 0;
List<XWPFParagraph> paragraphs = docx.getParagraphs();
if (paragraphs != null && paragraphs.isEmpty() == false)
{
for (XWPFParagraph paragraph : paragraphs)
{
pageText = paragraph.getText();
if (pageText != null && pageText.trim().length() > 0)
{
if (pageText.indexOf('\n') > -1)
{
pageText = this.removeDuplicateLines(pageText);
if (pageText != null && pageText.length() > 0)
{
md5Hash = this.calcHashCode(pageText);
if (md5Hash != null)
{
searchablePages++;
}
}
}
else
{
md5Hash = this.calcHashCode(pageText);
if (md5Hash != null)
{
searchablePages++;
}
}
}
}
}
}
catch (Throwable t)
{
t.printStackTrace();
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]