Would grouping solve this? I'd rather not move to a pre-release solr ... To clarify the problem:
The data are fine and not duplicated - however, I want to analyze the data, and summarize one field (kind of like faceting), to understand what the largest value is. For example: Document 1: label=1A1A1; body="adfasdfadsfasf" Document 2: label=5A1B1; body="adfaasdfasdfsdfadsfasf" Document 3: label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf" Document 4: label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf" Document 5: label=7A1A1; body="azxzxcvdfaasdfasdfsdasdaaaaafadsfasf" Document 6: label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz" How do I get back just ONE of the largest "label" item? In other words, what query will return the 7A1A1 label just once? If I search for q=* and sort the results, it works, except I get back multiple hits for each label. If I do a facet, I can only sort by increasing order, when what I want is decreasing order. -Peter On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote: > What version of Solr are you using? And, assuming the version that > has it in, have you seen grouping? > > Which is another way of asking why you want to do this, perhaps it's an > XY problem.... > > Best > Erick > > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam <ps...@mac.com> wrote: > >> Hi, >> >> I have documents with a field that has "1A2B3C" alphanumeric characters. I >> can query for * and sort results based on this field, however I'd like to >> "uniq" these results (remove duplicates) so that I can get the 5 largest >> unique values. I can't use the StatsComponent because my values have >> letters in them too. >> >> Faceting (and ignoring the counts) gets me half of the way there, but I can >> only sort ascending. If I could also sort facet results descending, I'd be >> done. I'd rather not return all documents and just parse the last few >> results to work around this. >> >> Any ideas? >> >> >> -Pete >>