Hi Toke, Thanks for your input. I guess you mean take the 1k or so values and build a boolean query from them? If that's not what you mean, my apologies.. I'd thought of doing that - the trouble I had was the unique values could be 20k, or 15,167 or any arbirary and potentially high-ish number - it's not really known and can/will change over time. I believe a boolean query with more than 1024 ops can blow up the query, so scalability is a concern. The other issue is how this would yield the unique facet values - e.g. dest=8.8.8.8 (17) [i.e. 8.8.8.8 is in the 'addr' list and occurs 17 times in entries with a 'dest' field] - in fact, I need the uniques value(s) ('8.8.8.8') more than I need the count ('17')
I could get the facet list of 'dest' values, then trawl through each one, but this will be a complicated and time-consuming client-side operation. I'm also looking at creating a custom QueryParser that would build the relevant DocLists, then intersect them and return the values, but I wouldn't want to reinvent the wheel if possible, given that facets already build unique term lists, seems so close - I guess it's like taking two facet lists (1 for addr, 1 for dest), intersecting them and returning the result: List 1: a b c d e f List 2: a a g z c c c e Resultant intersection: a (2) c (3) e (1) Thanks, Peter On Wed, Nov 19, 2014 at 7:16 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Peter Sturge [peter.stu...@gmail.com] wrote: > > [addr 7M unique, dest 1K unique] > > > What is the best/only/most efficient way to consutruct a search where by > I > > get back an (ideally faceted) list of values for 'dest' that occur in > > 'addr'? > > I assume the actual values are defined by a query? As the number of > possible values in dest is not that large, extracting those first and then > using them as a filter when searching for addr seems like a fairly > efficient way of solving the problem. > > - Toke Eskildsen >