Thank you Karl. I will investigate using Solr/DPP for this.
I will update this issue when I finally resolve what/how this was
implemented.
--mike
ed as its own field BEFORE indexing in Solr.
>
> My guess would be that I should use a Document processing pipeline in Solr
> like UIMA, or something of the like.
>
> However, to limit the amount of load on Solr, I was wondering if there was a
> way to "hook" into the Solr
I am crawling a bunch of HTML pages within a site, that will be sent to
Solr for indexing. I want to extract some content out of the pages,
each piece of content to be stored as its own field BEFORE indexing in Solr.
My guess would be that I should use a Document processing pipeline in
Solr