RE: Using Tika that comes with Solr 5.2

Uwe Schindler Wed, 03 Feb 2016 09:28:48 -0800

Hi,


Morphlines stuff is not needed at all. This is a Mapreduce/Hadoop integration 
of Solr (see documentation) – mostly command line tools around Solr and Hadoop.

 

FYI: In Solr we don’t show the warnings, because otherwise the user would get a 
lot of useless warnings. We may fix this when TIKA 2.0 comes with the 
tika-parsers module split into multiple parts. In Solr we currently removed all 
TIKA dependencies that conflict with the rest of Solr/Lucene or are not useful 
for fulltext indexing. We only left those that are useful for “fulltext” 
extraction (e.g. office document formats). But we have no parsers for CLASS 
files (breaks, because of ASM conflict) or Netcdf files (License issues in 
older versions of TIKA).

 

I see no reason to show these warnings, because if you use Solr as documented, 
it should work correctly. We no longer support running Solr inside foreign 
Application Servers. So everything should work out of box.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Steven White [mailto:[email protected]] 
Sent: Wednesday, February 03, 2016 5:44 PM
To: [email protected]
Subject: Re: Using Tika that comes with Solr 5.2

 

Nick, that would be a good think to do: changing Ignore to Warn otherwise 
newcomers will have no clue why this isn't working.

 

Another question to the team regarding this topic.

 

I see JARs under solr\contrib\morphlines-cell\lib\ and 
solr\contrib\morphlines-core\lib\  The ones under "morphlines-cell" there are 2 
files with "tika" as their name.  My question is, do I need those for general 
Tika usage?  The README.txt clearly states "*Experimental*" but doesn't say if 
I need them to use Tika.

 

Steve

 

 

On Wed, Feb 3, 2016 at 9:16 AM, Nick Burch <[email protected] 
<mailto:[email protected]> > wrote:

On Wed, 3 Feb 2016, Uwe Schindler wrote:

The reason for this behaviour is part of TIKA: If a parser cannot load because 
of classes it refers to are missing, it is automatically disabled. Because you 
missed the actual PDF/Powerpoint/… classes, this is what happens for all those 
parsers.


I wonder if it might be worth SOLR changing their default Tika config from 
Ignore to Warn, so that SOLR users (who probably aren't as clued up on how it 
all works as the average Tika user) will get to find out more quickly that 
they've missed something?

Nick

RE: Using Tika that comes with Solr 5.2

Reply via email to