Hi All, I spent some time today playing around with subfacets and facets functions now available in helios search 0.05 and I have some concerns... They look very promising .
I indexed 10 000 documents and built some queries to look at each feature and found some weird behaviour that I could not explain. The first query I made was to find all documents having the word "java" in their title and then compute a facet on the field position_id with stats about the field job_id. Basically, I want the number of unique Job_ids for each position_id for all matching documents. http://localhost:8983/solr/current/select?q=title:java&facet=on&facet.field=position_id&facet.stat=unique(job_id)&rows=1&facet.limit=10&facet.mincount=1&wt=json&indent=on&fl=job_id,position_id,super_alias_id the response looks good except for one little thing... the mincount is not respected whenever I specify the facet.stat parameter. Removing it will cause the mincount to be respected but then I need this parameter. Without the parameter the facet looks like this: "facet_counts":{ "facet_queries":{}, "facet_fields":{ "position_id":[ "265151",5, "927284",1, "1662380",1, "2625553",1, "2862455",1, "4128904",1, "4253203",1]}, <=== accounted for all 11 documents And now when adding the parameter: "facets":{ "position_id":{ "stats":{ "unique(job_id)":11, <== again, 11 documents, which is good "count":11}, "buckets":[{ "val":265151, "unique(job_id)":5, "count":5}, { "val":927284, "unique(job_id)":1, "count":1}, { "val":1662380, "unique(job_id)":1, "count":1}, { "val":2625553, "unique(job_id)":1, "count":1}, { "val":2862455, "unique(job_id)":1, "count":1}, { "val":4128904, "unique(job_id)":1, "count":1}, { "val":4253203, "unique(job_id)":1, "count":1}, { "val":1133, "unique(job_id)":0, <== what is this? "count":0}, .... Many zero entries following... I was wondering where the extra entries were coming from... the position_id = 1133 above is not even a match for my query (its title is "Audit Consultant") I`ve also noticed a similar behaviour when using subfacets. It looks like the number of items returned always match the "facet.limit" parameter. If not enough values are present for a given entry then the bucket is filled with documents not matching the original query. Am I doing something wrong?