RE: Not storing, but highlighting from document sentences

Steven A Rowe Wed, 12 Jan 2011 05:49:14 -0800

Hi Otis,

I think you can get what you want by doing the first stage retrieval, and then 
in the second stage, add required constraint(s) to the query for the matching 
docid(s), and change the AND operators in the original query to OR.  
Coordination will cause the best snippet(s) to rise to the top, no?


Hmm, you'll want to run the second stage once for each hit from the first 
stage, though, unless you can afford to collect *all* hits and pull out each 
first stage's hit from the intermixed second stage results...

Steve

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Wednesday, January 12, 2011 7:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Not storing, but highlighting from document sentences
> 
> Hi Stefan,
> 
> Yes, splitting in separate sentences (and storing them) is OK because with
> a
> bunch of sentences you can't really reconstruct the original article
> unless you
> know which order to put them in.
> 
> Searching against the sentence won't work for queries like foo AND bar
> because
> this should match original articles even if foo and bar are in different
> sentences.
> 
> Otis
> 
> 
> 
> ----- Original Message ----
> > From: Stefan Matheis <matheis.ste...@googlemail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Wed, January 12, 2011 7:02:46 AM
> > Subject: Re: Not storing, but highlighting from document sentences
> >
> > Otis,
> >
> > just interested in .. storing the full text is not allowed, but
> splitting up
> > in separate sentences is okay?
> >
> > while you think about  using the sentences only as secondary/additional
> > source, maybe it would help  to search in the sentences itself, or would
> that
> > give misleading results in  your case?
> >
> > Stefan
> >
> > On Wed, Jan 12, 2011 at 12:02 PM, Otis  Gospodnetic <
> > otis_gospodne...@yahoo.com>  wrote:
> >
> > > Hello,
> > >
> > > I'm indexing some content (articles)  whose text I cannot store in its
> > > original
> > > form for copyright  reason.  So I can index the content, but cannot
> store
> > > it.
> > >  However, I need snippets and search term highlighting.
> > >
> > >
> > >  Any way to accomplish this elegantly?  Or even not so  elegantly?
> > >
> > > Here is one idea:
> > >
> > > * Create 2 indices:  main index for indexing (but not storing) the
> original
> > > content, the  secondary index for storing individual sentences from
> the
> > >  original
> > > article.
> > >
> > > * That is, before indexing an article,  split it into sentences.  Then
> index
> > > the
> > > article in the  main index, and index+store each sentence in the
> secondary
> > > index.   So for each doc in the main index there will be multiple docs
> in
> > >  the
> > > secondary index with individual sentences.  Each sentence doc
> includes an
> > > ID of
> > > the "parent" document.
> > >
> > > * Then  run queries against the main index, and pull individual
> sentences
> > >  from
> > > the secondary index for snippet+highlight  purposes.
> > >
> > >
> > > The problem I see with this approach (and  there may be other ones
> that I am
> > > not
> > > seeing yet) is with  queries like foo AND bar.  In this case "foo" may
> be a
> > >  match
> > > from sentence #1, and "bar" may be a match from sentence #7.   Or
> maybe
> > > "foo" is
> > > a match in sentence #1, and "bar" is a match  in multiple sentences:
> #7 and
> > > #10
> > > and #23.
> > >
> > >  Regardless, when a query is run against the main index, you don't
> know
> > >  where the
> > > match was, so you don't know which sentences to go get from  the
> secondary
> > > index.
> > >
> > > Does anyone have any suggestions  for how to handle this?
> > >
> > > Thanks,
> > > Otis
> > >  ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene ecosystem  search :: http://search-lucene.com/
> > >
> > >
> >

RE: Not storing, but highlighting from document sentences

Reply via email to