Re: 1 main collection or multiple smaller collections?

Derek Poh Thu, 27 Apr 2017 19:36:39 -0700

Walter

Thank you for sharing your use case. I will try to design backwards fromthe search result pages.

As of now user can either do a supplier search or a product.search.

Using 1single collection of products documents, with supplier info ineach product document, for supplier search, I will need to use groupingresult or collapse parser.


On 4/28/2017 1:08 AM, Walter Underwood wrote:

Design backwards from the search result pages (SRP). Make flat schema(s) with 
the fields you will search and display.

One example is the schema I used at Netflix. I used one collection to hold 
movies, people (actors), and genres. There were collisions between the integer 
IDs, movies IDs were prefixed with “m”, people with “p”, and genres with “g”. 
The searched fields were “title” and “description”. There was also a “type” 
field which was “movie”, “person”, or “genre”. There was a also a field for the 
database ID (without the prefix).

A movie SRP used an “fq” filter of “type:movie”, and so on for other SRPs. 
There were a few other filters, like G-rated movies or streaming, DVD, HD DVD, 
or Bluray.

The full index was under 350K documents.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 27, 2017, at 10:01 AM, Rick Leir <rl...@leirtech.com> wrote:

Does it make sense to use nested documents here? Products could be nested in a 
supplier document perhaps.

Alternately, consider de-normalizing "til it hurts". A product doc might be 
able to contain supplier info.

On April 27, 2017 8:50:59 AM EDT, Shawn Heisey <apa...@elyograg.org> wrote:

On 4/26/2017 11:57 PM, Derek Poh wrote:

There are some common fields between them.
At the source data end (database), the supplier info and product info
are updated separately. In this regard, I should separate them?
If it's In 1 single collection, when there are updatesto only the
supplier info,the product info will be index again even though there
is noupdates to them, Is my reasoning valid?


On 4/27/2017 1:33 PM, Walter Underwood wrote:

Do they have the same fields or different fields? Are they updated
separately or together?

If they have the same fields and are updated together, I’d put them
in the same collection. Otherwise, probably separate.

Walter's statements are right on the money, you just might need a
little
more detail.

There are are two critical details that decide whether you even CAN
combine different data in a single index: One is that all types of
records must use the same field (the uniqueKey field) to determine
uniqueness, and the value of this field must be unique across the
entire
dataset.  The other is that there SHOULD be a field with a name like
"type" that your search client can use to differentiate the different
kinds of documents.  This type field is not necessary, but it does make
things easier.

Assuming you CAN combine documents, there is still the question of
whether you SHOULD.  If the fields that you will commonly search are
the
same between the different kinds of documents, and if people want to be
able to do one search and get more than one of the document types you
are indexing, then it is something you should consider.  If people will
only ever search one type of document, you should probably keep them in
separate indexes to keep things cleaner.

Thanks,
Shawn

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com



----------------------

CONFIDENTIALITY NOTICEThis e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: 1 main collection or multiple smaller collections?

Reply via email to