Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Fair enough. I'm actually using ManifoldCF to manage the indexing, and I see that they have a TIka Content Extraction transformer available, so I'll look into wiring that into my pipeline and see if that gets me the results I'm looking for. Thanks, Phil This message optimized for indexing by

Re: Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Erick Erickson
bq: s there any way to get reasonable behavior using the ExtractingRequestHandler, or should I just dump that approach and plan to run Tika outside of Solr, and then send Solr the exact content I want? Actually, this is recommended for a bunch of reasons, so I'd just go there straightaway. Tika ha

Issue with Solr Cell mixing metadata and content together

2017-12-21 Thread Phillip Rhodes
Hi all, I have been having an issue with Solr, using the ExtractingRequestHandler. Basically, when indexing a PDF (for example) I get all the metadata mixed into the "content" field along with the content. See: