Hi Stephen,

When I added numerical faceting to my checkout of solr (solr-1240) I basically copied date faceting and modified it to work with numbers instead of dates. With numbers I got a lot of doulbe-counted values as well. So to fix my problem I added an extra parameter to number faceting where you can specify if either end of each range should be inclusive or exclusive. I just ported it back to date faceting (disclaimer, completely untested) and it should be attached to my post.

The following parameter is added: facet.date.exclusive
valid values for the parameter are: start, end, both and neither

To maintain compatibility with solr without the patch the default is neither. I hope the meaning of the values are self-explanatory.

Regards,

gwk

Stephen Duncan Jr wrote:
If we do date faceting and start at 2009-01-01T00:00:00Z, end at
2009-01-03T00:00:00Z, with a gap of +1DAY, then documents that occur at
exactly 2009-01-02T00:00:00Z will be included in both the returned counts
(2009-01-01T00:00:00Z and 2009-01-02T00:00:00Z).  At the moment, this is
quite bad for us, as we only index the day-level, so all of our documents
are exactly on the line between each facet-range.

Because we know our data is indexed as being exactly at midnight each day, I
think we can simply always start from 1 second prior and get the results we
want (start=2008-12-31T23:59:59Z, end=2009-01-02T23:59:59Z), but I think
this problem would affect everyone, even if usually more subtly (instead of
all documents being counted twice, only a few on the fencepost between
ranges).

Is this a known behavior people are happy with, or should I file an issue
asking for ranges in date-facets to be constructed to subtract one second
from the end of each range (so that the effective range queries for my case
would be: [2009-01-01T00:00:00Z TO 2009-01-01T23:59:59Z] &
[2009-01-02T00:00:00Z TO 2009-01-02T23:59:59Z])?

Alternatively, is there some other suggested way of using the date faceting
to avoid this problem?


Index: src/java/org/apache/solr/request/SimpleFacets.java
===================================================================
--- src/java/org/apache/solr/request/SimpleFacets.java  (revision 809880)
+++ src/java/org/apache/solr/request/SimpleFacets.java  (working copy)
@@ -29,6 +29,7 @@
 import org.apache.solr.common.params.SolrParams;
 import org.apache.solr.common.params.CommonParams;
 import org.apache.solr.common.params.FacetParams.FacetDateOther;
+import org.apache.solr.common.params.FacetParams.FacetDateExclusive;
 import org.apache.solr.common.util.NamedList;
 import org.apache.solr.common.util.SimpleOrderedMap;
 import org.apache.solr.common.util.StrUtils;
@@ -586,6 +587,32 @@
            "date facet 'end' comes before 'start': "+endS+" < "+startS);
       }
 
+      boolean startInclusive = true;
+      boolean endInclusive = true;
+      final String[] exclusiveP =
+        params.getFieldParams(f,FacetParams.FACET_DATE_EXCLUSIVE);
+      if (null != exclusiveP && 0 < exclusiveP.length) {
+        Set<FacetDateExclusive> exclusives
+                = EnumSet.noneOf(FacetDateExclusive.class);
+        
+        for (final String e : exclusiveP) {
+          exclusives.add(FacetDateExclusive.get(e));
+        }
+        
+        if(! exclusives.contains(FacetDateExclusive.NEITHER) ) {
+          boolean both = exclusives.contains(FacetDateExclusive.BOTH);
+          
+          if(both || exclusives.contains(FacetDateExclusive.START)) {
+            startInclusive = false;
+          }
+          
+          if(both || exclusives.contains(FacetDateExclusive.END)) {
+            endInclusive = false;
+          }
+        }
+      }
+      
+      
       final String gap = required.getFieldParam(f,FacetParams.FACET_DATE_GAP);
       final DateMathParser dmp = new DateMathParser(ft.UTC, Locale.US);
       dmp.setNow(NOW);
@@ -610,7 +637,7 @@
               (SolrException.ErrorCode.BAD_REQUEST,
                "date facet infinite loop (is gap negative?)");
           }
-          resInner.add(label, rangeCount(sf,low,high,true,true));
+          resInner.add(label, 
rangeCount(sf,low,high,startInclusive,endInclusive));
           low = high;
         }
       } catch (java.text.ParseException e) {
@@ -639,15 +666,15 @@
         
           if (all || others.contains(FacetDateOther.BEFORE)) {
             resInner.add(FacetDateOther.BEFORE.toString(),
-                         rangeCount(sf,null,start,false,false));
+                         rangeCount(sf,null,start,false,!startInclusive));
           }
           if (all || others.contains(FacetDateOther.AFTER)) {
             resInner.add(FacetDateOther.AFTER.toString(),
-                         rangeCount(sf,end,null,false,false));
+                         rangeCount(sf,end,null,!endInclusive,false));
           }
           if (all || others.contains(FacetDateOther.BETWEEN)) {
             resInner.add(FacetDateOther.BETWEEN.toString(),
-                         rangeCount(sf,start,end,true,true));
+                         rangeCount(sf,start,end,startInclusive,endInclusive));
           }
         }
       }
Index: src/common/org/apache/solr/common/params/FacetParams.java
===================================================================
--- src/common/org/apache/solr/common/params/FacetParams.java   (revision 
809880)
+++ src/common/org/apache/solr/common/params/FacetParams.java   (working copy)
@@ -150,6 +150,14 @@
    * @see FacetDateOther
    */
   public static final String FACET_DATE_OTHER = FACET_DATE + ".other";
+  /**
+   * String indicating whether ranges for date range faceting 
+   * should be exclusive or inclusive. By default both the start and
+   * end point are inclusive.
+   * Can be overriden on a per field basis.
+   * @see FacetDateExclusive
+   */
+  public static final String FACET_DATE_EXCLUSIVE = FACET_DATE + ".exclusive";
 
     /**
    * An enumeration of the legal values for FACET_DATE_OTHER...
@@ -176,6 +184,34 @@
     }
   }
   
+  /**
+   * An enumeration of the legal values for FACET_DATE_EXCLUSIVE...
+   * <ul>
+   * <li>start = the start point for each range is exclusive
+   * (i.e. {start,end] ) </li>
+   * <li>end = the end point for each range is exclusive
+   * (i.e. [start,end} )</li>
+   * <li>both = both points are exclusive, this means fields which
+   * match exactly to one of the intermdiate and start/end points are
+   * not counted</li>
+   * <li>neither = neither the lower nor the upper point are
+   * exclusive, this is the default</li>
+   * </ul>
+   * @see FACET_DATE_EXCLUSIVE
+   */
+  public enum FacetDateExclusive {
+    START, END, BOTH, NEITHER;
+    public String toString() { return super.toString().toLowerCase(); }
+    public static FacetDateExclusive get(String label) {
+      try {
+        return valueOf(label.toUpperCase());
+      } catch (IllegalArgumentException e) {
+        throw new SolrException
+          (SolrException.ErrorCode.BAD_REQUEST,
+           label+" is not a valid type of 'exclusive' date facet 
information",e);
+      }
+    }
+  }
 
 }
 

Reply via email to