There is no great way.
One approach would be to 'de-normalize' at index time, to actually have
a field that looks like this:
institution_year: 2010.OHIO_ST ; 2007.YALE
Then, with some code on client side, you could more easily facet and
search how you want. It still doesn't (I don't think) make range queries
easy (or even possible?). And it can get un-manageable if you have more
than two dimensions.
Another solution like you say is trying to do multiple queries on
multiple document sets, but that gets tricky too.
There is also a "join" feature patch that is not currently in any
released Solr, but just got committed to master, no way to know when or
if it will end up in a released version for sure. I hestitate to link
to the JIRA because there's some ugly politics in the comments, but I
think it _might_ be useful in this use case. But I haven't completely
thought it through, but it is something useful in many of these sorts of
multi-valued, multi-dimensional "join" type cases.
https://issues.apache.org/jira/browse/SOLR-2272
But in general, yes, this is something that's hard to do in Solr/lucene.
Jonathan
On 4/27/2011 1:30 PM, ronotica wrote:
The nature of my project is such that search is needed and specifically
search across related entities. We want to perform several queries involving
a correlation between two or more properties of a given entity in a
collection.
To put things in context, here is a snippet of the domain:
Student { firstname, lastname }
Education { degreeCode, degreeYear, institution }
The database tables look like so:
STUDENT
----------
STUDENT_ID FNAME LNAME
100 John Doe
200 Rasheed Jones
300 Mary Hampton
EDUCATION
-------------
EDUCATION_ID DEGREE_CODE DEGREE_YR INSTITUTION
STUDENT_ID
1 MD 2008
OHIO_ST 100
2 PHD 2010 YALE
100
3 MS 2007
OHIO_ST 200
4 MD 2010 YALE
300
A student can have many educations. Currently, our documents look like this
in solr:
DOC_ID STUDENT_ID FNAME LNAME DEGREE_CODE DEGREE_YR
INSTITUTION
100 100 John Doe MD PHD
2008 2010 OHIO_ST YALE
101 200 Rasheed Jones MS
2007 OHIO_ST
102 300 Mary Hampton MD
2010 YALE
Searching for all students who graduated from OHIO_ST in 2010 currently
gives a hit (John Doe) when it shouldn't.
What is the best way to have overcome this issue in Solr? This is only
happening when I am searching across correlated fields, mainly because the
data has been denormalized and Lucene has no notion of relationships between
the various fields.
One way that as come to mind is to have separate documents for "education"
and perform multiple searches to get at an answer. Besides this, is there
any other way? Does Solr provide any elegant solution for this?
Any help will be greatly appreciated.
Thanks.
PS: We have about 15 of these kind of relationships all relating to the
student and will like to perform search on each of them.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-across-related-correlated-multivalue-fields-in-Solr-tp2871176p2871176.html
Sent from the Solr - User mailing list archive at Nabble.com.