There is no great way.

One approach would be to 'de-normalize' at index time, to actually have a field that looks like this:

institution_year: 2010.OHIO_ST  ;   2007.YALE

Then, with some code on client side, you could more easily facet and search how you want. It still doesn't (I don't think) make range queries easy (or even possible?). And it can get un-manageable if you have more than two dimensions.

Another solution like you say is trying to do multiple queries on multiple document sets, but that gets tricky too.

There is also a "join" feature patch that is not currently in any released Solr, but just got committed to master, no way to know when or if it will end up in a released version for sure. I hestitate to link to the JIRA because there's some ugly politics in the comments, but I think it _might_ be useful in this use case. But I haven't completely thought it through, but it is something useful in many of these sorts of multi-valued, multi-dimensional "join" type cases. https://issues.apache.org/jira/browse/SOLR-2272

But in general, yes, this is something that's hard to do in Solr/lucene.

Jonathan

On 4/27/2011 1:30 PM, ronotica wrote:
The nature of my project is such that search is needed and specifically
search across related entities. We want to perform several queries involving
a correlation between two or more properties of a given entity in a
collection.

To put things in context, here is a snippet of the domain:

Student { firstname, lastname }
Education { degreeCode, degreeYear, institution }

The database tables look like so:

STUDENT
----------
STUDENT_ID     FNAME      LNAME
100                 John          Doe
200                 Rasheed     Jones
300                 Mary          Hampton

EDUCATION
-------------
EDUCATION_ID      DEGREE_CODE       DEGREE_YR       INSTITUTION
STUDENT_ID
1                         MD                      2008
OHIO_ST                100
2                         PHD                     2010                 YALE
100
3                         MS                      2007
OHIO_ST               200
4                         MD                      2010                  YALE
300

A student can have many educations. Currently, our documents look like this
in solr:

DOC_ID       STUDENT_ID    FNAME       LNAME      DEGREE_CODE    DEGREE_YR
INSTITUTION
100             100                John          Doe          MD PHD
2008 2010     OHIO_ST YALE
101             200                Rasheed     Jones        MS
2007             OHIO_ST
102             300                Mary          Hampton   MD
2010             YALE

Searching for all students who graduated from OHIO_ST in 2010 currently
gives a hit (John Doe) when it shouldn't.

What is the best way to have overcome this issue in Solr? This is only
happening when I am searching across correlated fields, mainly because the
data has been denormalized and Lucene has no notion of relationships between
the various fields.

One way that as come to mind is to have separate documents for "education"
and perform multiple searches to get at an answer. Besides this, is there
any other way? Does Solr provide any elegant solution for this?

Any help will be greatly appreciated.

Thanks.

PS: We have about 15 of these kind of relationships all relating to the
student and will like to perform search on each of them.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-across-related-correlated-multivalue-fields-in-Solr-tp2871176p2871176.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to