On Fri, 15 Apr 2016, Thaddaeus Fillmore - US wrote:
Gah, I'm back. Ok, so now I'm trying to extract the text from a word document being uploaded to the server. (This is all in coldfusion). I first write a temp copy of the file to the disk. I have verified the file writes successfully and can be opened in word. Then, I try to use POI to read the file, but now I'm getting an exception when I try to use ExtractorFactory to createExtractor for the file.

I wouldn't recommend ExtractorFactory for new installations / new uses. You'd be much better off using Apache Tika instead for text extraction. Apache Tika builds on top of POI, amongst many others, and is where the bulk of the text extraction work happens these days. Tika can give you plain text, or metadata, or html, and generally does a lot more than the (now rather old) POI simple text extractors offer

You can also use the Tika Server or Tika CLI, which may be simpler for integration than trying to get the right jars and right class invocations in a framework like ColdFusion

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to