Hi,
Imagine that you have been tasked with designing a Solr-based system that
indexes and queries data aggregated from several relational databases. All the
data items are ultimately related to a common object type, so the logical data
schema looks like :
Customer
Profile
Events
Interactions
Steps
ActionPlans
Objectives
Actions
(and quite a few more, with up to 3 or 4 levels of nesting)
Each of these has a number of attributes. What you need to do is the Solr
equivalent of a SQL query that would find all Customers where
Profile.item1='...' and there is some Event where event.item1='..' and
event.item2='..' and there is some ActionPlan that has an Objective where
Objective.item3='..' and this Objective has an Action where Action.item1='..'
and Action.item2='..'
This is fairly trivial in a relational DB. But is it achievable in Solr ?
Needless to say, there is no way a flat schema can work here. So I have assumed
that nested documents were the only way to go. I have created a schema that
includes these prerequisites, where "docType" is meant to contain the document
type i.e. Profile, Event, Interaction etc :
<field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="_nest_path_" type="_nest_path_" indexed="true" stored="true"
docValues="true"/>
<fieldType name="_nest_path_" class="solr.NestPathField" />
<field name="_nest_parent_" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="_root_" type="string" indexed="true" stored="true"
docValues="true" />
<field name="docType" type="string" indexed="true" stored="true"
docValues="true" required="true"/>
<field name="id" type="string" indexed="true" stored="true" docValues="true"/>
<uniqueKey>id</uniqueKey>
and I have used naming conventions on the other fields to reflect the
structure, e.g.
<field name="customerId" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="customerName" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="interaction.code" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="interaction.result" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="step.code" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="step.result" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="event.type" type="string" indexed="true" stored="true"
docValues="true"/>
<field name="event.date" type="pdate" indexed="true" stored="true"
docValues="true"/>
I have been able to index my documents correctly, i.e. retrieving a Customer by
id with fl=*,[child limit=999] returns the customer and all dependent objects
in the correct hierarchy.
I have also been able to express queries on two levels, such as: find all
Customers that have an Interaction with code='..' and result='..' and that have
an Event with type='..' and date before '...'. The syntax looks like this,
using two filter queries and block joins :
q= docType:Customer AND (<simple filtering clauses>)
fq= {!parent which=docType:Customer} +docType:Interaction +interaction.code:C01
+interaction.result:COMPLETE
fq= {!parent which=docType:Customer} +docType:Event +event.type:EVTTYPE1
+event.date:[NOW-1YEAR TO NOW]
That seems to return correct results. However, I can't quite figure out the
proper way to take it one level deeper e.g. add filtering conditions on the
Steps of the Interactions selected by the first filter query.
For the record, I am able to do so using streaming expressions, but these bring
their own complexities so I am trying to find out first if something simpler
can save me.
Thanks !