RE: Using Tika that comes with Solr 5.2

Uwe Schindler Wed, 03 Feb 2016 05:54:29 -0800

Hi,


The reason for this behaviour is part of TIKA: If a parser cannot load because 
of classes it refers to are missing, it is automatically disabled. Because you 
missed the actual PDF/Powerpoint/… classes, this is what happens for all those 
parsers.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Steven White [mailto:[email protected]] 
Sent: Wednesday, February 03, 2016 2:48 PM
To: [email protected]
Subject: Re: Using Tika that comes with Solr 5.2

 

Thanks everyone.  After posting about this issue, I found my issue.  I was 
missing a whole set of Tika JARs that are found under Solr: 
\solr\contrib\extraction\lib\

 

Steve

 

On Wed, Feb 3, 2016 at 8:29 AM, Nick Burch <[email protected] 
<mailto:[email protected]> > wrote:

On Tue, 2 Feb 2016, Steven White wrote:

What I'm finding is that Tika will not extract the raw text off PDF,
Powerpoint, ets. files but it will off raw text files.


I'd suggest you try some of the steps in the troubleshooting page:
  http://wiki.apache.org/tika/Troubleshooting%20Tika
Probably start at the "No Content Extracted" section, and follow the links to 
the possible problems + ways to check

Solr 5.2 comes with the following Tika JARs which I have included all of
them: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar,
tika-xmp-1.7.jar, vorbis-java-tika-0.6.jar,
kite-morphlines-tika-core-0.12.1.jar and
kite-morphlines-tika-decompress-0.12.1.jar


You seem to be missing quite a few of the Tika dependencies, which may well be 
it, follow the troubleshooting guide to check!

Nick

RE: Using Tika that comes with Solr 5.2

Reply via email to