Hi A simple solution to this could be, for all such searches (foo and bar), search them as it is from 1st(primary index) and while sending these queries to secondary index replace and with or.
But in this particular scenario u could also have problem with proximity and phrase queries that is much difficult to tackle. Regards Ahsan ________________________________ From: Otis Gospodnetic <otis_gospodne...@yahoo.com> To: solr-user@lucene.apache.org Sent: Tue, January 18, 2011 12:25:12 PM Subject: Re: Not storing, but highlighting from document sentences Hi Tarjei, :) Yeah, that is the solution we are going with, actually. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Tarjei Huse <tar...@scanmine.com> > To: solr-user@lucene.apache.org > Sent: Tue, January 18, 2011 1:33:44 AM > Subject: Re: Not storing, but highlighting from document sentences > > On 01/12/2011 12:02 PM, Otis Gospodnetic wrote: > > Hello, > > > > I'm indexing some content (articles) whose text I cannot store in its >original > > > form for copyright reason. So I can index the content, but cannot store >it. > > > However, I need snippets and search term highlighting. > > > > > > Any way to accomplish this elegantly? Or even not so elegantly? > > > > Here is one idea: > > > > * Create 2 indices: main index for indexing (but not storing) the original > > content, the secondary index for storing individual sentences from the >original > > > article. > How about storing the sentences in the same index in a separate field > but with random ordering, would that be ok? > > Tarjei > > * That is, before indexing an article, split it into sentences. Then > > index >the > > > article in the main index, and index+store each sentence in the secondary > > index. So for each doc in the main index there will be multiple docs in > > the > > > > secondary index with individual sentences. Each sentence doc includes an >ID of > > > the "parent" document. > > > > * Then run queries against the main index, and pull individual sentences >from > > > the secondary index for snippet+highlight purposes. > > > > > > The problem I see with this approach (and there may be other ones that I > > am >not > > > seeing yet) is with queries like foo AND bar. In this case "foo" may be a >match > > > from sentence #1, and "bar" may be a match from sentence #7. Or maybe >"foo" is > > > a match in sentence #1, and "bar" is a match in multiple sentences: #7 and >#10 > > > and #23. > > > > Regardless, when a query is run against the main index, you don't know > > where > >the > > > match was, so you don't know which sentences to go get from the secondary >index. > > > > Does anyone have any suggestions for how to handle this? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > -- > Regards / Med vennlig hilsen > Tarjei Huse > Mobil: 920 63 413 > >