Thanks for the reply!  I actually got it to work using ExtractorFactory though. 
 (I had a typo in the path to the jar files).  Is Tika just for Office 
documents or can it also read other formats?  Ideally I'd like something that 
could process plain text, Word documents, pdfs, and images, but as of right now 
I'm able to handle all of those formats using a variety of means.

Thanks,

Thaddaeus Fillmore
CACI International Inc.
Systems Analyst
703-460-1425

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: Friday, April 15, 2016 11:03 AM
To: POI Users List
Subject: Re: New POI problem

On Fri, 15 Apr 2016, Thaddaeus Fillmore - US wrote:
> Gah, I'm back.  Ok, so now I'm trying to extract the text from a word 
> document being uploaded to the server.  (This is all in coldfusion).  
> I first write a temp copy of the file to the disk.  I have verified 
> the file writes successfully and can be opened in word.  Then, I try 
> to use POI to read the file, but now I'm getting an exception when I 
> try to use ExtractorFactory to createExtractor for the file.

I wouldn't recommend ExtractorFactory for new installations / new uses. 
You'd be much better off using Apache Tika instead for text extraction. 
Apache Tika builds on top of POI, amongst many others, and is where the bulk of 
the text extraction work happens these days. Tika can give you plain text, or 
metadata, or html, and generally does a lot more than the (now rather old) POI 
simple text extractors offer

You can also use the Tika Server or Tika CLI, which may be simpler for 
integration than trying to get the right jars and right class invocations in a 
framework like ColdFusion

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to