Hi everybody
It seems to me that the method getLineSeparator from PDF2XHTML
(package org.apache.tika.parser.pdf) may be improved.
I changed it
from:
public String getLineSeparator()
{
try
{
handler.characters("\n");
} catch(SAXException e) {
}
return super.getLineSeparator();
}
to:
public String getLineSeparator()
{
try
{
handler.element("br", "");
} catch(SAXException e) {
}
return super.getLineSeparator();
}
the resulting html is more pretty.
I hope this post could help someone.
see you,
Giunad.
--
If we have learned one thing from the history of invention and discovery,
it is that in the long run - and often in the short one - the most
daring prophecies seem laughably conservative.
Arthur C. Clarke, The Exploration of Space