Thanks Alexandre for correcting the link and Mikhail for sharing the ideas!

Mihkail,

I will need to look closer at your customization of SpansFacetComponent on
the blogpost.
Is it so, that in this component, you are accessing and counting the
matched spans?

Thanks,

Dmitry

On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Dmitry,
>
> Solr faceting is really fast due to using in-memory approach (keeping few
> noticeable exceptions in mind), hence spans should be slower. Reading term
> positions/payloads always has sensible gain. You can estimate it, if you
> compare time for a phrase query "foo bar" with a plain conjunction +foo
> +bar one.
> It worth to mention that our SpansFacetComponent performed well enough,
> even for public site. You can find my comment about performance numbers
> "64К docs with 5-20 span positions per each. Search result length 100-2000
> docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter
> box."
>
>
> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com> wrote:
>
> > Mikhail,
> >
> > Thanks for the guidance! This indeed sounds challenging, esp. given the
> > bonus of fighting with solr 3.x in light of disjunction queries.
> Although,
> > moving to solr 4.0 if this makes life easier should be ok.
> >
> > But even before getting one's hands dirty, it would be good to know, if
> > this is going to fly performance wise. Has your span based implementation
> > been fast enough? Did it stand close to the native solr's faceting in
> terms
> > of performance?
> >
> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > Dmitry,
> > >
> > > First of all, FacetComponent is the Solr's out-of-the-box
> functionality.
> > It
> > > runs after search is done and accesses the bitSet of the found
> document,
> > > i.e. there is no spans (matched terms positions) there at all.
> > >
> > > StandardFacetsAccumulator sounds like the "brand new" lucene faceting
> > > library. see http://shaierera.blogspot.com/. I don't think but don't
> > > exactly know whether they are accessible there too.
> > >
> > > Some time ago my team successfully prototyped facet component backed on
> > > spans
> > >
> >
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> > > I don't suggest you go this way.
> > > I can suggest you start from the following:
> > > - supply PostFilter/DelegatingCollector
> > > http://yonik.com/posts/advanced-filter-caching-in-solr/
> > > - the DelegatingCollector will accept the scorer instance
> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can
> > access
> > > the SpanQueryScorer in one of the legs and try to access the matched
> > spans
> > > - if you are in 3.x you'll have a problem with disjunction queries.
> > >
> > > it seems challenging, doesn't it?
> > >
> > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com>
> > написал:
> > >
> > > > Mikhail,
> > > >
> > > > Do you say, that it is not possible to access the matched terms
> > positions
> > > > in the FacetComponent? If that would be possible (somewhere in the
> > > > StandardFacetsAccumulator class, where docids are available), then by
> > > > knowing the matched term positions I can do some school simple math
> to
> > > > calculate the sentence counts per doc id.
> > > >
> > > > Dmitry
> > > >
> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> > > > mkhlud...@griddynamics.com> wrote:
> > > >
> > > > > Dmitry,
> > > > >
> > > > > It definitely seems like postptocessing highlighter's output. The
> > also
> > > > > approach is:
> > > > > - limit number of occurrences of a word in a sentence to 1
> > > > > - play with facet by function patch
> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by
> tf()
> > > > > function.
> > > > >
> > > > > It doesn't seem like much help.
> > > > >
> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <solrexp...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > that we actually require the count of the sentences inside
> > > > > > each document where the hits were found.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > Principal Engineer,
> > > > > Grid Dynamics
> > > > >
> > > > > <http://www.griddynamics.com>
> > > > >  <mkhlud...@griddynamics.com>
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>

Reply via email to