Hi

I want to do a fairly complex grouping request against Solr. Lets say that I have fields "field1" and "timestamp" for all my documents.

In the request I want to provide a set of time-intervals and for each distinct value of "field1" I want to get a count on in how many of the time-intervals there is at least one document where the value of "field1" is this distinct value. Smells like grouping but with an advanced counting.

Example
Documents in Solr
field1 | timestamp
a        | 1
a        | 2
b        | 1
a        | 3
c        | 5
a        | 10
b        | 12
b        | 11
a        | 13
d        | 14

Doing a query with the following time-intervals (both ends included)
time-interval#1: 1 to 2
time-interval#2: 3 to 5
time-interval#3: 6 to 12

I would like to get the following result
field1-value | count
a                  | 3
b                  | 2
c                  | 1
Reasons
* field1-value a: Count=3, because there is a document with field1=a and a timestamp between 1 to 2 (actually there are 2 such documents, but we only count in how many time-intervals a is present and do not consider how many times a is present in that interval), AND because there is a document with field1=a and a timestamp between 3 and 5, AND because there is a document with field1=a and a timestamp between 6 and 12 * field1-value b: Count=2, because there is at least one document with field1=b in time-interval#1 AND time-interval#3 (there is no document with field1=b in time-interval#2) * field1-value c: Count=1, because there is at least one document with field1=c in time-interval#2 (there is no document with field1=c in neither time-interval#1 nor time-interval#3) * No field1-value=d in the result-set, because d is not in at least in one of the time-intervals.

The query part of the request probably needs to be
* q=timestamp:([1 TO 2]) OR timestamp:([3 TO 5]) OR timestamp:([6 TO 12])
but if I just add the following to the request
* group=true
* group.field=field1
* group.limit=1 (strange that you cannot set this to 0 BTW - I am not interested in one of the documents)
I will get the following result
field1/group-value | count
a | 4 (because there is a total of 4 documents with field1=a in those time-intervals)
b                            | 3
c                            | 1

1) Is it possible for me to create a request that will produce the result I want?
2) If yes to 1), how? What will the request look like?
3) If yes to 1), will it work in a distributed SolrCloud setup?
4) If yes to 1), will it perform?
5) If no to 1), is there a fairly simple Solr-code-change I can do in order to make it possible? You do not have to hand me the solution, but a few comments on how easy/hard it would be, and ideas on how to attack the challenge would be nice.

Thanks!

Regards, Per Steffensen

Reply via email to