Re: Documents and cores

Erick Erickson Tue, 19 Oct 2010 19:17:01 -0700

This is something most everybody has to get over when transitioning from the
DB
world to Solr/Lucene. The schema describes the #possible# fields in the
document.
There is absolutely no requirement that #every# document in the index have
all these fields in them (unless #you# define it so with <field .....
required="true">.


Solr will happily index documents that have fields missing, so feel free...
You should be able to define your people and parts documents as you
choose, with perhaps some common fields.

You'll have to take some care not to form queries like name:ralph AND
sku:12345
assuming that the name field is only in people and sku only in parts....

Do continue down the path of de-normalization. That's another thing most DB
folks
don't want to do. Each document you index should contain all the data you
need.
The moment you find yourself asking "how to I do a join" you should stop and
consider further de-normalization.....

HTH
Erick


On Tue, Oct 19, 2010 at 10:39 AM, Olson, Ron <rol...@lbpc.com> wrote:

> Hi all-
>
> I have a newbie design question about documents, especially with SQL
> databases. I am trying to set up Solr to go against a database that, for
> example, has "items" and "people". The way I see it, and I don't know if
> this is right or not (thus the question), is that I see both as separate
> documents as an item may contain a list of parts, which the user may want to
> search, and, as part of the "item", view the list of people who have ordered
> the item.
>
> Then there's the actual "people", who the user might want to search to find
> a name and, consequently, what items they ordered. To me they are both "top
> level" things, with some overlap of fields. If I'm searching for "people",
> I'm likely not going to be interested in the parts of the item, while if I'm
> searching for "items" the likelihood is that I may want to search for
> "42532" which is, in this instance, a SKU, and not get hits on the zip code
> section of the "people".
>
> Does it make sense, then, to separate these two out as separate documents?
> I believe so because the documentation I've read suggests that a document
> should be analogous to a row in a table (in this case, very de-normalized).
> What is tripping me up is, as far as I can tell, you can have only one
> document type per index, and thus one document per core. So in this example,
> I have two cores, "items" and "people". Is this correct? Should I embrace
> the idea of having many cores or am I supposed to have a single, unified
> index with all documents (which doesn't seem like Solr supports).
>
> The ultimate question comes down to the search interface. I don't
> necessarily want to have the user explicitly state which document they want
> to search; I'd like them to simply type "42532" and get documents from both
> cores, and then possibly allow for filtering results after the fact, not
> before. As I've only used the admin site so far (which is core-specific),
> does the client API allow for unified searching across all cores? Assuming
> it does, I'd think my idea of multiple-documents is okay, but I'd love to
> hear from people who actually know what they're doing. :)
>
> Thanks,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with
> it is  unauthorized and strictly prohibited.  If you have received this
> message in error, please notify the sender immediately by reply e-mail and
> permanently delete and destroy this message and its attachments, along with
> any copies thereof. This message does not create any contractual obligation
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>

Re: Documents and cores

Reply via email to