Hi Steven, if I understand correctly, you are suggesting query execution in two phases: first execute query on whole article index core (where whole articles are indexed, but not stored) to get article IDs (for articles which match original query). Then for each match in article core: change the AND operators from the original query to OR and add articleID condition/filter and execute such query on sentence based index (with assumption each sentence based doc has articleID set).
Is this correct and it this what is "you'll want to run the second stage once for each hit from the first stage, though" referring to? Example for this scenario would be for original query "q=apples and oranges", execute "q=apples and orange" with fl=articleId on article core and for each articleIdX result execute "q=(apples OR orange) AND articleId:articleIdX" on sentence based core. Same thing (with the same results) should be doable with only a single query in second phase, for previous example that single query for second phase would be for all articleId1,...,articleIdN something like: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleIdN) But, here in second case results are ordered by sentence scoring instead of article and reslts should be re-ordered. Is this what is "unless you can afford to collect *all* hits and pull out each first stage's hit from the intermixed second stage results" refering to? My actual question after this really long intro is: couldn't this be done with single second level query approach, but on each topN start/row chunk as user iterates through first level results? For example, user executes query "q=apples and oranges" and this results in 1000 results, but first page display only for example 20 results which means proposed solution would: 1. phase: execute execute "q=apples and orange" with fl=articleId on article core, but with start=0&rows=20 2. phase: q=((apples OR orange) AND articleId:articleId1) OR ((apples OR orange) AND articleId:articleId2) OR ... OR apples OR orange) AND articleId:articleId20) 3. Reorder sentence results to match order defined by article matching scores and return to user Only, the results here would need to be collapsed on unique articleID, so only 20 results are provided in result set (because multiple "sentence based doc" can be returned for a single unique articleID) Would this work? Thanks, Tomislav 2011/1/12 Steven A Rowe <sar...@syr.edu>: > Hi Otis, > > I think you can get what you want by doing the first stage retrieval, and > then in the second stage, add required constraint(s) to the query for the > matching docid(s), and change the AND operators in the original query to OR. > Coordination will cause the best snippet(s) to rise to the top, no? > > Hmm, you'll want to run the second stage once for each hit from the first > stage, though, unless you can afford to collect *all* hits and pull out each > first stage's hit from the intermixed second stage results... > > Steve > >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Sent: Wednesday, January 12, 2011 7:29 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Not storing, but highlighting from document sentences >> >> Hi Stefan, >> >> Yes, splitting in separate sentences (and storing them) is OK because with >> a >> bunch of sentences you can't really reconstruct the original article >> unless you >> know which order to put them in. >> >> Searching against the sentence won't work for queries like foo AND bar >> because >> this should match original articles even if foo and bar are in different >> sentences. >> >> Otis >> >> >> >> ----- Original Message ---- >> > From: Stefan Matheis <matheis.ste...@googlemail.com> >> > To: solr-user@lucene.apache.org >> > Sent: Wed, January 12, 2011 7:02:46 AM >> > Subject: Re: Not storing, but highlighting from document sentences >> > >> > Otis, >> > >> > just interested in .. storing the full text is not allowed, but >> splitting up >> > in separate sentences is okay? >> > >> > while you think about using the sentences only as secondary/additional >> > source, maybe it would help to search in the sentences itself, or would >> that >> > give misleading results in your case? >> > >> > Stefan >> > >> > On Wed, Jan 12, 2011 at 12:02 PM, Otis Gospodnetic < >> > otis_gospodne...@yahoo.com> wrote: >> > >> > > Hello, >> > > >> > > I'm indexing some content (articles) whose text I cannot store in its >> > > original >> > > form for copyright reason. So I can index the content, but cannot >> store >> > > it. >> > > However, I need snippets and search term highlighting. >> > > >> > > >> > > Any way to accomplish this elegantly? Or even not so elegantly? >> > > >> > > Here is one idea: >> > > >> > > * Create 2 indices: main index for indexing (but not storing) the >> original >> > > content, the secondary index for storing individual sentences from >> the >> > > original >> > > article. >> > > >> > > * That is, before indexing an article, split it into sentences. Then >> index >> > > the >> > > article in the main index, and index+store each sentence in the >> secondary >> > > index. So for each doc in the main index there will be multiple docs >> in >> > > the >> > > secondary index with individual sentences. Each sentence doc >> includes an >> > > ID of >> > > the "parent" document. >> > > >> > > * Then run queries against the main index, and pull individual >> sentences >> > > from >> > > the secondary index for snippet+highlight purposes. >> > > >> > > >> > > The problem I see with this approach (and there may be other ones >> that I am >> > > not >> > > seeing yet) is with queries like foo AND bar. In this case "foo" may >> be a >> > > match >> > > from sentence #1, and "bar" may be a match from sentence #7. Or >> maybe >> > > "foo" is >> > > a match in sentence #1, and "bar" is a match in multiple sentences: >> #7 and >> > > #10 >> > > and #23. >> > > >> > > Regardless, when a query is run against the main index, you don't >> know >> > > where the >> > > match was, so you don't know which sentences to go get from the >> secondary >> > > index. >> > > >> > > Does anyone have any suggestions for how to handle this? >> > > >> > > Thanks, >> > > Otis >> > > ---- >> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> > > Lucene ecosystem search :: http://search-lucene.com/ >> > > >> > > >> > >