Something like this.

On Tue, Mar 5, 2013 at 6:16 PM, Dmitry Kan <solrexp...@gmail.com> wrote:

> Hello,
>
> I spent some more time on this and used Mikhail's suggestions of which
> classes would need to be implemented.
>
> 1. Since we use SpanQuery family, we would need to modify the SpanScorer to
> collect some stats over matched spans.
> 2. DelegatingCollector takes Scorer class via setScorer() method. The class
> will have access to the statistics that is collected in the SpanScorer
> class.
> 3. This DelegatingCollector class should then be referenced in the
> SolrIndexSearcher class. There will be a need to implement some getter
> methods for accessing the above statistics.
> 4. Make use of this modified SolrIndexSearcher in the SimpleFacets class.
> 5. Access the statistics that is visible in the SimpleFacets class in the
> FacetComponent, in the method process().
>
> Does this sound like an accurate list of classes to modify? Am I missing
> something, any road blocks?
>
> Dmitry
>
> On Wed, Jan 23, 2013 at 12:47 PM, Dmitry Kan <solrexp...@gmail.com> wrote:
>
> > Thanks Alexandre for correcting the link and Mikhail for sharing the
> ideas!
> >
> > Mihkail,
> >
> > I will need to look closer at your customization of SpansFacetComponent
> on
> > the blogpost.
> > Is it so, that in this component, you are accessing and counting the
> > matched spans?
> >
> > Thanks,
> >
> > Dmitry
> >
> >
> > On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> >> Dmitry,
> >>
> >> Solr faceting is really fast due to using in-memory approach (keeping
> few
> >> noticeable exceptions in mind), hence spans should be slower. Reading
> term
> >> positions/payloads always has sensible gain. You can estimate it, if you
> >> compare time for a phrase query "foo bar" with a plain conjunction +foo
> >> +bar one.
> >> It worth to mention that our SpansFacetComponent performed well enough,
> >> even for public site. You can find my comment about performance numbers
> >> "64К docs with 5-20 span positions per each. Search result length
> 100-2000
> >> docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter
> >> box."
> >>
> >>
> >> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com>
> wrote:
> >>
> >> > Mikhail,
> >> >
> >> > Thanks for the guidance! This indeed sounds challenging, esp. given
> the
> >> > bonus of fighting with solr 3.x in light of disjunction queries.
> >> Although,
> >> > moving to solr 4.0 if this makes life easier should be ok.
> >> >
> >> > But even before getting one's hands dirty, it would be good to know,
> if
> >> > this is going to fly performance wise. Has your span based
> >> implementation
> >> > been fast enough? Did it stand close to the native solr's faceting in
> >> terms
> >> > of performance?
> >> >
> >> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
> >> > mkhlud...@griddynamics.com> wrote:
> >> >
> >> > > Dmitry,
> >> > >
> >> > > First of all, FacetComponent is the Solr's out-of-the-box
> >> functionality.
> >> > It
> >> > > runs after search is done and accesses the bitSet of the found
> >> document,
> >> > > i.e. there is no spans (matched terms positions) there at all.
> >> > >
> >> > > StandardFacetsAccumulator sounds like the "brand new" lucene
> faceting
> >> > > library. see http://shaierera.blogspot.com/. I don't think but
> don't
> >> > > exactly know whether they are accessible there too.
> >> > >
> >> > > Some time ago my team successfully prototyped facet component backed
> >> on
> >> > > spans
> >> > >
> >> >
> >>
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> >> > > I don't suggest you go this way.
> >> > > I can suggest you start from the following:
> >> > > - supply PostFilter/DelegatingCollector
> >> > > http://yonik.com/posts/advanced-filter-caching-in-solr/
> >> > > - the DelegatingCollector will accept the scorer instance
> >> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can
> >> > access
> >> > > the SpanQueryScorer in one of the legs and try to access the matched
> >> > spans
> >> > > - if you are in 3.x you'll have a problem with disjunction queries.
> >> > >
> >> > > it seems challenging, doesn't it?
> >> > >
> >> > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com>
> >> > написал:
> >> > >
> >> > > > Mikhail,
> >> > > >
> >> > > > Do you say, that it is not possible to access the matched terms
> >> > positions
> >> > > > in the FacetComponent? If that would be possible (somewhere in the
> >> > > > StandardFacetsAccumulator class, where docids are available), then
> >> by
> >> > > > knowing the matched term positions I can do some school simple
> math
> >> to
> >> > > > calculate the sentence counts per doc id.
> >> > > >
> >> > > > Dmitry
> >> > > >
> >> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> >> > > > mkhlud...@griddynamics.com> wrote:
> >> > > >
> >> > > > > Dmitry,
> >> > > > >
> >> > > > > It definitely seems like postptocessing highlighter's output.
> The
> >> > also
> >> > > > > approach is:
> >> > > > > - limit number of occurrences of a word in a sentence to 1
> >> > > > > - play with facet by function patch
> >> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by
> >> tf()
> >> > > > > function.
> >> > > > >
> >> > > > > It doesn't seem like much help.
> >> > > > >
> >> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <
> >> solrexp...@gmail.com>
> >> > > > wrote:
> >> > > > >
> >> > > > > > that we actually require the count of the sentences inside
> >> > > > > > each document where the hits were found.
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Sincerely yours
> >> > > > > Mikhail Khludnev
> >> > > > > Principal Engineer,
> >> > > > > Grid Dynamics
> >> > > > >
> >> > > > > <http://www.griddynamics.com>
> >> > > > >  <mkhlud...@griddynamics.com>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> <http://www.griddynamics.com>
> >>  <mkhlud...@griddynamics.com>
> >>
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to