Re: Require searching only for file content and not metadata

Shawn Heisey Wed, 28 Aug 2019 01:47:43 -0700

On 8/27/2019 7:18 AM, Khare, Kushal (MIND) wrote:

Basically, what problem I am facing is - I am getting the textual content + 
other metadata in my _text_ field. But, I want only the textual content written 
inside the document.
I tried various Request Handler Update Extract configurations, but none of them 
worked for me.
Please help me resolve this as I am badly stuck in this.

Controlling exactly what gets indexed in which fields is likely going torequire that you write the indexing software yourself -- a program thatextracts the data you want and sends it to Solr for indexing.

We do not recommend running the Extracting Request Handler in production-- Tika is known to crash when given some documents (usually PDF filesare the problematic ones, but other formats can cause it too), and if itcrashes while running inside Solr, it will take Solr down with it.

Here is an example program that uses Tika for rich document parsing. Italso talks to a database, but that part could be easily removed or modified:


https://lucidworks.com/post/indexing-with-solrj/

Thanks,
Shawn

Re: Require searching only for file content and not metadata

Reply via email to