Re: Solr search – Tika extracted text from PDF not return highlighting snippet

tuxdna Mon, 18 Feb 2013 10:51:23 -0800

I am replying to this post because I am also facing "very similar" issue.


I am indexing the documents stored in a blob field of a MySQL database. I
have described the whole setup in the following blog post:

http://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/


Basically, the blob content is fetched from database, and then it is parsed
by Tika and converted into text. All the fields in the datbase table get
indexed properly except the blob field ( which was processed by Tika ). It
doesn't reflect in Solr schema browser. There are no terms against the text
field. 

I tried with some permutation and combination of the fields in (
db-data-config.xml and schema.xml ) and got it working. I now have to fields
"text" and "text1", where "text" is indexed + stored, and "text2" is
neither. However if I remove "text2" from configuration, I am back to the
same problem i.e. the field doesn't get indexed. 

I don't understand how, the above work around is working. Can anyone give me
pointers where I can explore further to understand this behaviour? Is it
solvable using copyField ?

NOTE: I have described the configuration files and setup in the link above.

Thanks in advance! :)

/tuxdna




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647p4041180.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

Reply via email to