Something like this.
On Tue, Mar 5, 2013 at 6:16 PM, Dmitry Kan <solrexp...@gmail.com> wrote: > Hello, > > I spent some more time on this and used Mikhail's suggestions of which > classes would need to be implemented. > > 1. Since we use SpanQuery family, we would need to modify the SpanScorer to > collect some stats over matched spans. > 2. DelegatingCollector takes Scorer class via setScorer() method. The class > will have access to the statistics that is collected in the SpanScorer > class. > 3. This DelegatingCollector class should then be referenced in the > SolrIndexSearcher class. There will be a need to implement some getter > methods for accessing the above statistics. > 4. Make use of this modified SolrIndexSearcher in the SimpleFacets class. > 5. Access the statistics that is visible in the SimpleFacets class in the > FacetComponent, in the method process(). > > Does this sound like an accurate list of classes to modify? Am I missing > something, any road blocks? > > Dmitry > > On Wed, Jan 23, 2013 at 12:47 PM, Dmitry Kan <solrexp...@gmail.com> wrote: > > > Thanks Alexandre for correcting the link and Mikhail for sharing the > ideas! > > > > Mihkail, > > > > I will need to look closer at your customization of SpansFacetComponent > on > > the blogpost. > > Is it so, that in this component, you are accessing and counting the > > matched spans? > > > > Thanks, > > > > Dmitry > > > > > > On Tue, Jan 22, 2013 at 9:17 PM, Mikhail Khludnev < > > mkhlud...@griddynamics.com> wrote: > > > >> Dmitry, > >> > >> Solr faceting is really fast due to using in-memory approach (keeping > few > >> noticeable exceptions in mind), hence spans should be slower. Reading > term > >> positions/payloads always has sensible gain. You can estimate it, if you > >> compare time for a phrase query "foo bar" with a plain conjunction +foo > >> +bar one. > >> It worth to mention that our SpansFacetComponent performed well enough, > >> even for public site. You can find my comment about performance numbers > >> "64К docs with 5-20 span positions per each. Search result length > 100-2000 > >> docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter > >> box." > >> > >> > >> On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com> > wrote: > >> > >> > Mikhail, > >> > > >> > Thanks for the guidance! This indeed sounds challenging, esp. given > the > >> > bonus of fighting with solr 3.x in light of disjunction queries. > >> Although, > >> > moving to solr 4.0 if this makes life easier should be ok. > >> > > >> > But even before getting one's hands dirty, it would be good to know, > if > >> > this is going to fly performance wise. Has your span based > >> implementation > >> > been fast enough? Did it stand close to the native solr's faceting in > >> terms > >> > of performance? > >> > > >> > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev < > >> > mkhlud...@griddynamics.com> wrote: > >> > > >> > > Dmitry, > >> > > > >> > > First of all, FacetComponent is the Solr's out-of-the-box > >> functionality. > >> > It > >> > > runs after search is done and accesses the bitSet of the found > >> document, > >> > > i.e. there is no spans (matched terms positions) there at all. > >> > > > >> > > StandardFacetsAccumulator sounds like the "brand new" lucene > faceting > >> > > library. see http://shaierera.blogspot.com/. I don't think but > don't > >> > > exactly know whether they are accessible there too. > >> > > > >> > > Some time ago my team successfully prototyped facet component backed > >> on > >> > > spans > >> > > > >> > > >> > blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut > >> > > I don't suggest you go this way. > >> > > I can suggest you start from the following: > >> > > - supply PostFilter/DelegatingCollector > >> > > http://yonik.com/posts/advanced-filter-caching-in-solr/ > >> > > - the DelegatingCollector will accept the scorer instance > >> > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can > >> > access > >> > > the SpanQueryScorer in one of the legs and try to access the matched > >> > spans > >> > > - if you are in 3.x you'll have a problem with disjunction queries. > >> > > > >> > > it seems challenging, doesn't it? > >> > > > >> > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com> > >> > написал: > >> > > > >> > > > Mikhail, > >> > > > > >> > > > Do you say, that it is not possible to access the matched terms > >> > positions > >> > > > in the FacetComponent? If that would be possible (somewhere in the > >> > > > StandardFacetsAccumulator class, where docids are available), then > >> by > >> > > > knowing the matched term positions I can do some school simple > math > >> to > >> > > > calculate the sentence counts per doc id. > >> > > > > >> > > > Dmitry > >> > > > > >> > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev < > >> > > > mkhlud...@griddynamics.com> wrote: > >> > > > > >> > > > > Dmitry, > >> > > > > > >> > > > > It definitely seems like postptocessing highlighter's output. > The > >> > also > >> > > > > approach is: > >> > > > > - limit number of occurrences of a word in a sentence to 1 > >> > > > > - play with facet by function patch > >> > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by > >> tf() > >> > > > > function. > >> > > > > > >> > > > > It doesn't seem like much help. > >> > > > > > >> > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan < > >> solrexp...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > > > that we actually require the count of the sentences inside > >> > > > > > each document where the hits were found. > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Sincerely yours > >> > > > > Mikhail Khludnev > >> > > > > Principal Engineer, > >> > > > > Grid Dynamics > >> > > > > > >> > > > > <http://www.griddynamics.com> > >> > > > > <mkhlud...@griddynamics.com> > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev > >> Principal Engineer, > >> Grid Dynamics > >> > >> <http://www.griddynamics.com> > >> <mkhlud...@griddynamics.com> > >> > > > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>