Hi Stephen,
When I added numerical faceting to my checkout of solr (solr-1240) I
basically copied date faceting and modified it to work with numbers
instead of dates. With numbers I got a lot of doulbe-counted values as
well. So to fix my problem I added an extra parameter to number faceting
where you can specify if either end of each range should be inclusive or
exclusive. I just ported it back to date faceting (disclaimer,
completely untested) and it should be attached to my post.
The following parameter is added: facet.date.exclusive
valid values for the parameter are: start, end, both and neither
To maintain compatibility with solr without the patch the default is
neither. I hope the meaning of the values are self-explanatory.
Regards,
gwk
Stephen Duncan Jr wrote:
If we do date faceting and start at 2009-01-01T00:00:00Z, end at
2009-01-03T00:00:00Z, with a gap of +1DAY, then documents that occur at
exactly 2009-01-02T00:00:00Z will be included in both the returned counts
(2009-01-01T00:00:00Z and 2009-01-02T00:00:00Z). At the moment, this is
quite bad for us, as we only index the day-level, so all of our documents
are exactly on the line between each facet-range.
Because we know our data is indexed as being exactly at midnight each day, I
think we can simply always start from 1 second prior and get the results we
want (start=2008-12-31T23:59:59Z, end=2009-01-02T23:59:59Z), but I think
this problem would affect everyone, even if usually more subtly (instead of
all documents being counted twice, only a few on the fencepost between
ranges).
Is this a known behavior people are happy with, or should I file an issue
asking for ranges in date-facets to be constructed to subtract one second
from the end of each range (so that the effective range queries for my case
would be: [2009-01-01T00:00:00Z TO 2009-01-01T23:59:59Z] &
[2009-01-02T00:00:00Z TO 2009-01-02T23:59:59Z])?
Alternatively, is there some other suggested way of using the date faceting
to avoid this problem?
Index: src/java/org/apache/solr/request/SimpleFacets.java
===================================================================
--- src/java/org/apache/solr/request/SimpleFacets.java (revision 809880)
+++ src/java/org/apache/solr/request/SimpleFacets.java (working copy)
@@ -29,6 +29,7 @@
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.FacetParams.FacetDateOther;
+import org.apache.solr.common.params.FacetParams.FacetDateExclusive;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.common.util.StrUtils;
@@ -586,6 +587,32 @@
"date facet 'end' comes before 'start': "+endS+" < "+startS);
}
+ boolean startInclusive = true;
+ boolean endInclusive = true;
+ final String[] exclusiveP =
+ params.getFieldParams(f,FacetParams.FACET_DATE_EXCLUSIVE);
+ if (null != exclusiveP && 0 < exclusiveP.length) {
+ Set<FacetDateExclusive> exclusives
+ = EnumSet.noneOf(FacetDateExclusive.class);
+
+ for (final String e : exclusiveP) {
+ exclusives.add(FacetDateExclusive.get(e));
+ }
+
+ if(! exclusives.contains(FacetDateExclusive.NEITHER) ) {
+ boolean both = exclusives.contains(FacetDateExclusive.BOTH);
+
+ if(both || exclusives.contains(FacetDateExclusive.START)) {
+ startInclusive = false;
+ }
+
+ if(both || exclusives.contains(FacetDateExclusive.END)) {
+ endInclusive = false;
+ }
+ }
+ }
+
+
final String gap = required.getFieldParam(f,FacetParams.FACET_DATE_GAP);
final DateMathParser dmp = new DateMathParser(ft.UTC, Locale.US);
dmp.setNow(NOW);
@@ -610,7 +637,7 @@
(SolrException.ErrorCode.BAD_REQUEST,
"date facet infinite loop (is gap negative?)");
}
- resInner.add(label, rangeCount(sf,low,high,true,true));
+ resInner.add(label,
rangeCount(sf,low,high,startInclusive,endInclusive));
low = high;
}
} catch (java.text.ParseException e) {
@@ -639,15 +666,15 @@
if (all || others.contains(FacetDateOther.BEFORE)) {
resInner.add(FacetDateOther.BEFORE.toString(),
- rangeCount(sf,null,start,false,false));
+ rangeCount(sf,null,start,false,!startInclusive));
}
if (all || others.contains(FacetDateOther.AFTER)) {
resInner.add(FacetDateOther.AFTER.toString(),
- rangeCount(sf,end,null,false,false));
+ rangeCount(sf,end,null,!endInclusive,false));
}
if (all || others.contains(FacetDateOther.BETWEEN)) {
resInner.add(FacetDateOther.BETWEEN.toString(),
- rangeCount(sf,start,end,true,true));
+ rangeCount(sf,start,end,startInclusive,endInclusive));
}
}
}
Index: src/common/org/apache/solr/common/params/FacetParams.java
===================================================================
--- src/common/org/apache/solr/common/params/FacetParams.java (revision
809880)
+++ src/common/org/apache/solr/common/params/FacetParams.java (working copy)
@@ -150,6 +150,14 @@
* @see FacetDateOther
*/
public static final String FACET_DATE_OTHER = FACET_DATE + ".other";
+ /**
+ * String indicating whether ranges for date range faceting
+ * should be exclusive or inclusive. By default both the start and
+ * end point are inclusive.
+ * Can be overriden on a per field basis.
+ * @see FacetDateExclusive
+ */
+ public static final String FACET_DATE_EXCLUSIVE = FACET_DATE + ".exclusive";
/**
* An enumeration of the legal values for FACET_DATE_OTHER...
@@ -176,6 +184,34 @@
}
}
+ /**
+ * An enumeration of the legal values for FACET_DATE_EXCLUSIVE...
+ * <ul>
+ * <li>start = the start point for each range is exclusive
+ * (i.e. {start,end] ) </li>
+ * <li>end = the end point for each range is exclusive
+ * (i.e. [start,end} )</li>
+ * <li>both = both points are exclusive, this means fields which
+ * match exactly to one of the intermdiate and start/end points are
+ * not counted</li>
+ * <li>neither = neither the lower nor the upper point are
+ * exclusive, this is the default</li>
+ * </ul>
+ * @see FACET_DATE_EXCLUSIVE
+ */
+ public enum FacetDateExclusive {
+ START, END, BOTH, NEITHER;
+ public String toString() { return super.toString().toLowerCase(); }
+ public static FacetDateExclusive get(String label) {
+ try {
+ return valueOf(label.toUpperCase());
+ } catch (IllegalArgumentException e) {
+ throw new SolrException
+ (SolrException.ErrorCode.BAD_REQUEST,
+ label+" is not a valid type of 'exclusive' date facet
information",e);
+ }
+ }
+ }
}