Thanks Mikhail. On Tue, Mar 5, 2013 at 8:23 PM, Mikhail Khludnev <mkhlud...@griddynamics.com > wrote:
> Something like this. > > > On Tue, Mar 5, 2013 at 6:16 PM, Dmitry Kan <solrexp...@gmail.com> wrote: > > > Hello, > > > > I spent some more time on this and used Mikhail's suggestions of which > > classes would need to be implemented. > > > > 1. Since we use SpanQuery family, we would need to modify the SpanScorer > to > > collect some stats over matched spans. > > 2. DelegatingCollector takes Scorer class via setScorer() method. The > class > > will have access to the statistics that is collected in the SpanScorer > > class. > > 3. This DelegatingCollector class should then be referenced in the > > SolrIndexSearcher class. There will be a need to implement some getter > > methods for accessing the above statistics. > > 4. Make use of this modified SolrIndexSearcher in the SimpleFacets class. > > 5. Access the statistics that is visible in the SimpleFacets class in the > > FacetComponent, in the method process(). > > > > Does this sound like an accurate list of classes to modify? Am I missing > > something, any road blocks? > > > > Dmitry > > > > On Wed, Jan 23, 2013 at 12:47 PM, Dmitry Kan <solrexp...@gmail.com> > wrote: > > > > > Thanks Alexandre for correcting the link and Mikhail for sharing the > > ideas! > > > > > > Mihkail, > > > > > > I will need to look closer at your customization of SpansFacetComponent > > on > > > the blogpost. > > > Is it so, that in this component, you are accessing and counting the > > > matched spans? > > > > > > Thanks, > > > > > > Dmitry > > > > > > > > > On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev < > > > mkhlud...@griddynamics.com> wrote: > > > > > >> Dmitry, > > >> > > >> Solr faceting is really fast due to using in-memory approach (keeping > > few > > >> noticeable exceptions in mind), hence spans should be slower. Reading > > term > > >> positions/payloads always has sensible gain. You can estimate it, if > you > > >> compare time for a phrase query "foo bar" with a plain conjunction > +foo > > >> +bar one. > > >> It worth to mention that our SpansFacetComponent performed well > enough, > > >> even for public site. You can find my comment about performance > numbers > > >> "64К docs with 5-20 span positions per each. Search result length > > 100-2000 > > >> docs with 3-5 facet fields. It shows 100 q/sec on an average > datacenter > > >> box." > > >> > > >> > > >> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com> > > wrote: > > >> > > >> > Mikhail, > > >> > > > >> > Thanks for the guidance! This indeed sounds challenging, esp. given > > the > > >> > bonus of fighting with solr 3.x in light of disjunction queries. > > >> Although, > > >> > moving to solr 4.0 if this makes life easier should be ok. > > >> > > > >> > But even before getting one's hands dirty, it would be good to know, > > if > > >> > this is going to fly performance wise. Has your span based > > >> implementation > > >> > been fast enough? Did it stand close to the native solr's faceting > in > > >> terms > > >> > of performance? > > >> > > > >> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev < > > >> > mkhlud...@griddynamics.com> wrote: > > >> > > > >> > > Dmitry, > > >> > > > > >> > > First of all, FacetComponent is the Solr's out-of-the-box > > >> functionality. > > >> > It > > >> > > runs after search is done and accesses the bitSet of the found > > >> document, > > >> > > i.e. there is no spans (matched terms positions) there at all. > > >> > > > > >> > > StandardFacetsAccumulator sounds like the "brand new" lucene > > faceting > > >> > > library. see http://shaierera.blogspot.com/. I don't think but > > don't > > >> > > exactly know whether they are accessible there too. > > >> > > > > >> > > Some time ago my team successfully prototyped facet component > backed > > >> on > > >> > > spans > > >> > > > > >> > > > >> > > > blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut > > >> > > I don't suggest you go this way. > > >> > > I can suggest you start from the following: > > >> > > - supply PostFilter/DelegatingCollector > > >> > > http://yonik.com/posts/advanced-filter-caching-in-solr/ > > >> > > - the DelegatingCollector will accept the scorer instance > > >> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you > can > > >> > access > > >> > > the SpanQueryScorer in one of the legs and try to access the > matched > > >> > spans > > >> > > - if you are in 3.x you'll have a problem with disjunction > queries. > > >> > > > > >> > > it seems challenging, doesn't it? > > >> > > > > >> > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com> > > >> > написал: > > >> > > > > >> > > > Mikhail, > > >> > > > > > >> > > > Do you say, that it is not possible to access the matched terms > > >> > positions > > >> > > > in the FacetComponent? If that would be possible (somewhere in > the > > >> > > > StandardFacetsAccumulator class, where docids are available), > then > > >> by > > >> > > > knowing the matched term positions I can do some school simple > > math > > >> to > > >> > > > calculate the sentence counts per doc id. > > >> > > > > > >> > > > Dmitry > > >> > > > > > >> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev < > > >> > > > mkhlud...@griddynamics.com> wrote: > > >> > > > > > >> > > > > Dmitry, > > >> > > > > > > >> > > > > It definitely seems like postptocessing highlighter's output. > > The > > >> > also > > >> > > > > approach is: > > >> > > > > - limit number of occurrences of a word in a sentence to 1 > > >> > > > > - play with facet by function patch > > >> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished > by > > >> tf() > > >> > > > > function. > > >> > > > > > > >> > > > > It doesn't seem like much help. > > >> > > > > > > >> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan < > > >> solrexp...@gmail.com> > > >> > > > wrote: > > >> > > > > > > >> > > > > > that we actually require the count of the sentences inside > > >> > > > > > each document where the hits were found. > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > -- > > >> > > > > Sincerely yours > > >> > > > > Mikhail Khludnev > > >> > > > > Principal Engineer, > > >> > > > > Grid Dynamics > > >> > > > > > > >> > > > > <http://www.griddynamics.com> > > >> > > > > <mkhlud...@griddynamics.com> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> -- > > >> Sincerely yours > > >> Mikhail Khludnev > > >> Principal Engineer, > > >> Grid Dynamics > > >> > > >> <http://www.griddynamics.com> > > >> <mkhlud...@griddynamics.com> > > >> > > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >