I just started using Solr, and I am trying to figure out how to setup my
schema. I know that Solr doesn’t have JOINs, and so I am having some
difficulty figuring out how would I setup a schema for the following
fictional situation.  For example, let us say that :

-       I have a 10000+ customers, each having some specific info (StoreId , 
Name,
Phone, Address, City, State, Zip, etc)
-       Each customer has a subset of the 100+ products I am looking to track,
each product having some specific info (ProductId, Name, Width, Height,
Depth, Weight, Density, etc)
-       I want to be able to search by the product info but have facets return 
the
number of customers, rather than the number of products, that meet my
criteria
-       I want to display (and sort) customers based on my product search

In relational databases, I would simply create two tables (customer and
product) and JOIN them.  I could then craft a sql query to count the number
of distinct StoreId values in the result (something like facets).

In Solr, however, there are no joins.  As far as I can tell, my options are
to:

-       create two Solr instances, one with customer info and one with product
info; I would search the product Solr instance and identify the StoreId
values return, and then use that info to search the customer Solr instance
to get the customer info.  The problem with this is the second query could
have ten thousand ANDs (one for each StoreId returned by the first query)
-       create a single Solr instance that contains a denormalized version of 
the
data where each doc would contain both the customer info and the product
info for a given product.  The problem with this is that my facets would
return the number of products, not the number of customers
-       create a single Solr instance that contains a denormalized version of 
the
data where each doc contains the customer info and info for ALL products
that the  customer might have (likely done via dynamicfields). The problem
with this is that my schema would be a bit messy and that my queries could
have hundreds of ANDs and Ors (one AND for each product field, and one OR
for each product); for example, q=((Width1:50 AND Density1:7) OR (Width2:50
AND Density2:7) OR …)

Does anyone have any advice on this?  Are there other schemas that might
work?  Hopefully the example makes sense.

-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26600956.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to