Thanks Mikhail.

On Tue, Mar 5, 2013 at 8:23 PM, Mikhail Khludnev <mkhlud...@griddynamics.com
> wrote:

> Something like this.
>
>
> On Tue, Mar 5, 2013 at 6:16 PM, Dmitry Kan <solrexp...@gmail.com> wrote:
>
> > Hello,
> >
> > I spent some more time on this and used Mikhail's suggestions of which
> > classes would need to be implemented.
> >
> > 1. Since we use SpanQuery family, we would need to modify the SpanScorer
> to
> > collect some stats over matched spans.
> > 2. DelegatingCollector takes Scorer class via setScorer() method. The
> class
> > will have access to the statistics that is collected in the SpanScorer
> > class.
> > 3. This DelegatingCollector class should then be referenced in the
> > SolrIndexSearcher class. There will be a need to implement some getter
> > methods for accessing the above statistics.
> > 4. Make use of this modified SolrIndexSearcher in the SimpleFacets class.
> > 5. Access the statistics that is visible in the SimpleFacets class in the
> > FacetComponent, in the method process().
> >
> > Does this sound like an accurate list of classes to modify? Am I missing
> > something, any road blocks?
> >
> > Dmitry
> >
> > On Wed, Jan 23, 2013 at 12:47 PM, Dmitry Kan <solrexp...@gmail.com>
> wrote:
> >
> > > Thanks Alexandre for correcting the link and Mikhail for sharing the
> > ideas!
> > >
> > > Mihkail,
> > >
> > > I will need to look closer at your customization of SpansFacetComponent
> > on
> > > the blogpost.
> > > Is it so, that in this component, you are accessing and counting the
> > > matched spans?
> > >
> > > Thanks,
> > >
> > > Dmitry
> > >
> > >
> > > On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > >> Dmitry,
> > >>
> > >> Solr faceting is really fast due to using in-memory approach (keeping
> > few
> > >> noticeable exceptions in mind), hence spans should be slower. Reading
> > term
> > >> positions/payloads always has sensible gain. You can estimate it, if
> you
> > >> compare time for a phrase query "foo bar" with a plain conjunction
> +foo
> > >> +bar one.
> > >> It worth to mention that our SpansFacetComponent performed well
> enough,
> > >> even for public site. You can find my comment about performance
> numbers
> > >> "64К docs with 5-20 span positions per each. Search result length
> > 100-2000
> > >> docs with 3-5 facet fields. It shows 100 q/sec on an average
> datacenter
> > >> box."
> > >>
> > >>
> > >> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com>
> > wrote:
> > >>
> > >> > Mikhail,
> > >> >
> > >> > Thanks for the guidance! This indeed sounds challenging, esp. given
> > the
> > >> > bonus of fighting with solr 3.x in light of disjunction queries.
> > >> Although,
> > >> > moving to solr 4.0 if this makes life easier should be ok.
> > >> >
> > >> > But even before getting one's hands dirty, it would be good to know,
> > if
> > >> > this is going to fly performance wise. Has your span based
> > >> implementation
> > >> > been fast enough? Did it stand close to the native solr's faceting
> in
> > >> terms
> > >> > of performance?
> > >> >
> > >> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
> > >> > mkhlud...@griddynamics.com> wrote:
> > >> >
> > >> > > Dmitry,
> > >> > >
> > >> > > First of all, FacetComponent is the Solr's out-of-the-box
> > >> functionality.
> > >> > It
> > >> > > runs after search is done and accesses the bitSet of the found
> > >> document,
> > >> > > i.e. there is no spans (matched terms positions) there at all.
> > >> > >
> > >> > > StandardFacetsAccumulator sounds like the "brand new" lucene
> > faceting
> > >> > > library. see http://shaierera.blogspot.com/. I don't think but
> > don't
> > >> > > exactly know whether they are accessible there too.
> > >> > >
> > >> > > Some time ago my team successfully prototyped facet component
> backed
> > >> on
> > >> > > spans
> > >> > >
> > >> >
> > >>
> >
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> > >> > > I don't suggest you go this way.
> > >> > > I can suggest you start from the following:
> > >> > > - supply PostFilter/DelegatingCollector
> > >> > > http://yonik.com/posts/advanced-filter-caching-in-solr/
> > >> > > - the DelegatingCollector will accept the scorer instance
> > >> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you
> can
> > >> > access
> > >> > > the SpanQueryScorer in one of the legs and try to access the
> matched
> > >> > spans
> > >> > > - if you are in 3.x you'll have a problem with disjunction
> queries.
> > >> > >
> > >> > > it seems challenging, doesn't it?
> > >> > >
> > >> > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com>
> > >> > написал:
> > >> > >
> > >> > > > Mikhail,
> > >> > > >
> > >> > > > Do you say, that it is not possible to access the matched terms
> > >> > positions
> > >> > > > in the FacetComponent? If that would be possible (somewhere in
> the
> > >> > > > StandardFacetsAccumulator class, where docids are available),
> then
> > >> by
> > >> > > > knowing the matched term positions I can do some school simple
> > math
> > >> to
> > >> > > > calculate the sentence counts per doc id.
> > >> > > >
> > >> > > > Dmitry
> > >> > > >
> > >> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> > >> > > > mkhlud...@griddynamics.com> wrote:
> > >> > > >
> > >> > > > > Dmitry,
> > >> > > > >
> > >> > > > > It definitely seems like postptocessing highlighter's output.
> > The
> > >> > also
> > >> > > > > approach is:
> > >> > > > > - limit number of occurrences of a word in a sentence to 1
> > >> > > > > - play with facet by function patch
> > >> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished
> by
> > >> tf()
> > >> > > > > function.
> > >> > > > >
> > >> > > > > It doesn't seem like much help.
> > >> > > > >
> > >> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <
> > >> solrexp...@gmail.com>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > that we actually require the count of the sentences inside
> > >> > > > > > each document where the hits were found.
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Sincerely yours
> > >> > > > > Mikhail Khludnev
> > >> > > > > Principal Engineer,
> > >> > > > > Grid Dynamics
> > >> > > > >
> > >> > > > > <http://www.griddynamics.com>
> > >> > > > >  <mkhlud...@griddynamics.com>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Sincerely yours
> > >> Mikhail Khludnev
> > >> Principal Engineer,
> > >> Grid Dynamics
> > >>
> > >> <http://www.griddynamics.com>
> > >>  <mkhlud...@griddynamics.com>
> > >>
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>

Reply via email to