Re: Document Processing

2011-12-05 Thread Michael Kelleher
Thank you Karl. I will investigate using Solr/DPP for this. I will update this issue when I finally resolve what/how this was implemented. --mike

Re: Document Processing

2011-12-05 Thread Karl Wright
ed as its own field BEFORE indexing in Solr. > > My guess would be that I should use a Document processing pipeline in Solr > like UIMA, or something of the like. > > However, to limit the amount of load on Solr, I was wondering if there was a > way to "hook" into the Solr

Document Processing

2011-12-05 Thread Michael Kelleher
I am crawling a bunch of HTML pages within a site, that will be sent to Solr for indexing. I want to extract some content out of the pages, each piece of content to be stored as its own field BEFORE indexing in Solr. My guess would be that I should use a Document processing pipeline in Solr