Thanks for the reply! I actually got it to work using ExtractorFactory though. (I had a typo in the path to the jar files). Is Tika just for Office documents or can it also read other formats? Ideally I'd like something that could process plain text, Word documents, pdfs, and images, but as of right now I'm able to handle all of those formats using a variety of means.
Thanks, Thaddaeus Fillmore CACI International Inc. Systems Analyst 703-460-1425 -----Original Message----- From: Nick Burch [mailto:[email protected]] Sent: Friday, April 15, 2016 11:03 AM To: POI Users List Subject: Re: New POI problem On Fri, 15 Apr 2016, Thaddaeus Fillmore - US wrote: > Gah, I'm back. Ok, so now I'm trying to extract the text from a word > document being uploaded to the server. (This is all in coldfusion). > I first write a temp copy of the file to the disk. I have verified > the file writes successfully and can be opened in word. Then, I try > to use POI to read the file, but now I'm getting an exception when I > try to use ExtractorFactory to createExtractor for the file. I wouldn't recommend ExtractorFactory for new installations / new uses. You'd be much better off using Apache Tika instead for text extraction. Apache Tika builds on top of POI, amongst many others, and is where the bulk of the text extraction work happens these days. Tika can give you plain text, or metadata, or html, and generally does a lot more than the (now rather old) POI simple text extractors offer You can also use the Tika Server or Tika CLI, which may be simpler for integration than trying to get the right jars and right class invocations in a framework like ColdFusion Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
