Hi Tarjei, :) Yeah, that is the solution we are going with, actually.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Tarjei Huse <tar...@scanmine.com> > To: solr-user@lucene.apache.org > Sent: Tue, January 18, 2011 1:33:44 AM > Subject: Re: Not storing, but highlighting from document sentences > > On 01/12/2011 12:02 PM, Otis Gospodnetic wrote: > > Hello, > > > > I'm indexing some content (articles) whose text I cannot store in its >original > > > form for copyright reason. So I can index the content, but cannot store >it. > > > However, I need snippets and search term highlighting. > > > > > > Any way to accomplish this elegantly? Or even not so elegantly? > > > > Here is one idea: > > > > * Create 2 indices: main index for indexing (but not storing) the original > > content, the secondary index for storing individual sentences from the >original > > > article. > How about storing the sentences in the same index in a separate field > but with random ordering, would that be ok? > > Tarjei > > * That is, before indexing an article, split it into sentences. Then > > index >the > > > article in the main index, and index+store each sentence in the secondary > > index. So for each doc in the main index there will be multiple docs in > > the > > > secondary index with individual sentences. Each sentence doc includes an >ID of > > > the "parent" document. > > > > * Then run queries against the main index, and pull individual sentences >from > > > the secondary index for snippet+highlight purposes. > > > > > > The problem I see with this approach (and there may be other ones that I > > am >not > > > seeing yet) is with queries like foo AND bar. In this case "foo" may be a >match > > > from sentence #1, and "bar" may be a match from sentence #7. Or maybe >"foo" is > > > a match in sentence #1, and "bar" is a match in multiple sentences: #7 and >#10 > > > and #23. > > > > Regardless, when a query is run against the main index, you don't know > > where >the > > > match was, so you don't know which sentences to go get from the secondary >index. > > > > Does anyone have any suggestions for how to handle this? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > -- > Regards / Med vennlig hilsen > Tarjei Huse > Mobil: 920 63 413 > >