subject:"Issue with Solr Cell mixing metadata and content together"

Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes

Fair enough. I'm actually using ManifoldCF to manage the indexing, and I see that they have a TIka Content Extraction transformer available, so I'll look into wiring that into my pipeline and see if that gets me the results I'm looking for. Thanks, Phil This message optimized for indexing by

Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Erick Erickson

bq: s there any way to get reasonable behavior using the ExtractingRequestHandler, or should I just dump that approach and plan to run Tika outside of Solr, and then send Solr the exact content I want? Actually, this is recommended for a bunch of reasons, so I'd just go there straightaway. Tika ha

Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes

Hi all, I have been having an issue with Solr, using the ExtractingRequestHandler. Basically, when indexing a PDF (for example) I get all the metadata mixed into the "content" field along with the content. See:

Re: Issue with Solr Cell mixing metadata and content together

Re: Issue with Solr Cell mixing metadata and content together

Issue with Solr Cell mixing metadata and content together

3 matches

Site Navigation

Mail list logo

Footer information