If you're willing to write some Java you can do something more efficient by intersecting two terms enumerations: this works with constant memory for any number of values in two fields, basically like intersecting any two sorted lists, you leap frog between them. I have an example if you're interested (I was finding compounds by indexing shingles and intersecting with regular word terms), but there isn't any support for using it in a query, or as part of Solr: it's just an offline kind of thing you can run against your index.

-Mike


On 11/19/2014 5:53 PM, Peter Sturge wrote:
Hi Toke,
Yes, the 'lots-of-booleans' thing is a bit prohibitive as it won't
realistically scale to large value sets.

I've been wrestling with joins this evening and have managed to get these
working - and it works very nicely - and across cores (although not shards
yet afaik)!

For anyone looking to do this sort of facet intersecting, here's my query:
127.0.0.1:8983/solr/net/select?q=*:*&fl=dest&fl=src&facet=true&fq={!join
from=addr to=dest
fromIndex=targets}*&facet.field=src&facet.field=dest&facet.mincount=1&facet.limit=-1&facet.sort=count&rows=0

Thanks,
Peter


On Wed, Nov 19, 2014 at 9:23 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

Peter Sturge [peter.stu...@gmail.com] wrote:
I guess you mean take the 1k or so values and build a boolean query from
them?
Not really. Let me try again:

1) Perform a facet call with facet.limit=-1 on dest to get the relevant
dest values.
The result will always be 1000 values or less. Take those values and
construct a filter query "a OR b OR c".

2) Perform a facet call on addr with the original query + the newly
constructed filter query.
The facet response should not contain the intersection.

1000 is a bit close to the default limit for boolean queries, so you might
want to raise that.

I'm also looking at creating a custom QueryParser that would build the
relevant DocLists, then intersect them and return the values, [...]
You are describing a Join in Solr and that would likely solve your
problem, but it does not work across cores. Is it possible to have both the
addr and dest data in the same core?

- Toke Eskildsen


Reply via email to