Hi Dominik
Sure I attached the symbol_test.doc document in the previous email. I think I cannot attach the document in email? Is there anyway I can share the document? Thanks T. On 06/10/2019 16:29, Dominik Stadler wrote:
Hi, can you share an example document which shows the behavior? Thanks... Dominik. On Sun, Oct 6, 2019 at 6:48 AM Teresa Kim <[email protected]> wrote:Hi I have documents (either 'doc' or 'docx') that have a special character for 'greater than equal' and using codes in 'WordToHtmlConverter', I see those characters are converted into '('. I tried with the latest apache poi release 4.1.0. My java code is: public class TestWordtoHtmlConverter { public static void main(String[] args ) { try { HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(args[0])); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter( DocumentBuilderFactory.newInstance().newDocumentBuilder() .newDocument()); wordToHtmlConverter.processDocument(wordDocument); Document htmlDocument = wordToHtmlConverter.getDocument(); ByteArrayOutputStream out = new ByteArrayOutputStream(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); out.close(); String result = new String(out.toByteArray()); System.out.println(result); } catch (Exception e) { } Is there anyway I can correctly identify these symbols? In the sample document, I am interested in getting 'bad one'. Thanks T. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
