This may have been an issue with Solr's wrapper of Tika. See: https://issues.apache.org/jira/browse/SOLR-7189
-----Original Message----- From: 步青云 [mailto:mailliup...@qq.com] Sent: Wednesday, June 17, 2015 10:17 PM To: solr-user Subject: About indexing embed file with solr Hello, Could anyone recieve my email? I'm new to solr and I have some questions, could anyone help me to give me some answer? I index file directly by extracting the content of file using Tika embeded in solr. There is no problem of normal files. While I index a word embeded an another file, such as a pdf file embed in a word, I couldn't get the content of embeded file. For example, I have a word(doc) and there is a pdf embeded in the word(doc), I couldn't index the content of the pdf file. While using the same jar of Tika to extract the content of embed file, I can get the content of embeded file. I know Tika could extract the embed file since version 1.3. And the version of my solr is 4.9.1, Tika used in this version of solr is 1.5. I don't know why I can't get the content of embed file. Could anyone help me? Thank you very much. Ping Liu 18 June. 2015