This may have been an issue with Solr's wrapper of Tika.  See:

-----Original Message-----
From: 步青云 [] 
Sent: Wednesday, June 17, 2015 10:17 PM
To: solr-user
Subject: About indexing embed file with solr

      Could anyone recieve my email? I'm new to solr and I have some questions, 
could anyone help me to give me some answer?
      I index file directly by extracting the content of file using Tika 
embeded in solr. There is no problem of normal files. While I index a word 
embeded an another file, such as a pdf file embed in a word, I couldn't get the 
content of embeded file. For example, I have a word(doc) and there is a pdf 
embeded in the word(doc), I couldn't index the content of the pdf file. While 
using the same jar of Tika to extract the content of embed file, I can get the 
content of embeded file.
      I know Tika could extract the embed file since version 1.3. And the 
version of my solr is 4.9.1, Tika used in this version of solr is 1.5. I don't 
know why I can't get the content of embed file.
      Could anyone help me? Thank you very much.
           Ping Liu
         18 June. 2015

Reply via email to