https://issues.apache.org/bugzilla/show_bug.cgi?id=47731
Summary: Word Extractor considers text copied from some website as an embedded object Product: POI Version: 3.2-FINAL Platform: PC OS/Version: Windows Server 2003 Status: NEW Severity: major Priority: P2 Component: HWPF AssignedTo: dev@poi.apache.org ReportedBy: gi.bijl...@sap.com --- Comment #0 from Gitu <gi.bijl...@sap.com> 2009-08-24 22:50:21 PDT --- Hi, I have copied some text from some web page and pasted that in a word document. Now, when I use WordExtractor to extract the content of that document, then complete content gets extracted but the summary information comes multiple times. After investigating I came to know that each part in that document is considered as an embedded object and hence for each embedded object, summary is getting extracted ie. same value is coming those many times. I also wanted to know if considering an HTML content as an Embedded object is a valid behaviour. I have attached a document which can reproduce the scenario. Many thanks in advance, Gitu -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org