I saw some previous threads related to this subject, but on a slightly 
different use case, so staring a new thread...
 
For reference, a related thread topic can be found here: 
http://www.lucidimagination.com/search/document/2025d6670004838b/date_faceting_and_double_counting#2025d6670004838b
 
This has to do with date facets setting double counts across adjacent date 
facets, if the documents' time is 'on the cusp'.
 
In fact, I found this problem because I was testing date facets where the gap 
is +1SECOND. In this case many/most/all document counts can be duplicated, 
because as a general rule in my case, milliseconds are set to 0, and there is 
'No logic for milliseconds' in the DateMathParser. This behaviour can sometimes 
be observed in general date faceting -- in the +1SECOND scenario, it is much 
more likely to occur (because these values are more likely to be quantized).
 
I had a look at the date math with regards this (in SimpleFacets.java : 
getFacetDateCounts()), and I noticed the following line of code (~line 622):

    resInner.add(label, rangeCount(sf,low,high,true,true));
 
The two 'true' booleans mean: 'include at start of range' *AND* 'include at end 
of range'. Any documents that live on the border will match in date.facet[n] 
and date.facet[n+1], because of the 'double-sided' inclusive range search.
 
By convention, a time value of '0' (00:00) belongs to the next period, rather 
than the previous, so I changed the *first* boolean to false, and voila! no 
more duplications! I believe this will be the case for other gap values, not 
just +1SECOND. 
 
As there's no need to read any '[' or '{' because date faceting doesn't 
have/need these, the patch couldn't be simpler.
 
My question to the experts of this code is:
Was this done for a reason - are there any implications somewhere else for 
having a Lucene-double-sided-inclusive search?
I can't think of any reason, but perhaps someone knows differently?
 
If interested parties are in agreement, I can create an issue for it and the 
associated fix.
 
Many thanks,
Peter
 
 
                                          
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Reply via email to