By the way, and in case its not obvious, I don't mean to suggest that we should remove the ability to specify a set of shards in the search interface. What I'm saying is that using something like "group" instead is simpler in most cases.
Jon On Wed, Feb 10, 2010 at 9:14 AM, Jon Gifford <jon.giff...@gmail.com> wrote: > On Wed, Feb 10, 2010 at 8:04 AM, Yonik Seeley > <yo...@lucidimagination.com> wrote: >> On Wed, Feb 10, 2010 at 12:02 AM, Jon Gifford <jon.giff...@gmail.com> wrote: >>> Alternatively, I could create a collection per customer, which removes >>> the need for slices, but means duplicating the schema many times. >> >> Multiple collections should be able to share a single config (schema >> and related config files). > > OK, this solves the top level problem (how do I manage a single > customers index, while guaranteeing that all indices have the same > schema), which is good. > >> Note: I've backed off of the use of "slice" in the public APIs since >> it was contentious (although I still think it's a useful concept and >> it does remain in some of the code). "shard" is kind of ambiguous, >> but people are pretty good at dealing with ambiguity (and removing >> that ambiguity by introducing another term seemed to add more >> perceived complexity). > > I agree that its a very useful concept, and wonder how much of the > contention is just a terminology issue? If we used subcollection > instead, the intent becomes clearer for some use-cases. If we used > tag, or taggroup, then a slightly different (more powerful?) intent is > suggested. > >>> The second part of what I need is to be able to search a single >>> customers index, which I'm assuming will be a slice. Something like: >>> >>> >>> http://localhost:8983/solr/collection1/select?distrib=true&slice=customer_1 >> >> The URLs on the SolrCloud page have been updated - this would now be >> http://localhost:8983/solr/collection1/select?distrib=true&shards=customer_1 >> >> This will work as long as no customer becomes bigger than a shard. If >> that's not the case, you could query the entire collection and filter >> on customer_1, or create a collection per customer (or do both, if you >> mave many small customers that you want to pack in a single shard). > > right. I'd most likely default to using a collection per customer > (assuming that collections can share a single config) because a single > customers index will be larger than a single shard. > >> >> http://localhost:8983/solr/collection1/select?distrib=true&collection=customer_1 >> >>> Reading over some of the previous discussions, slices seem to be >>> somewhat contentious, and I wanted to chime in on them a bit here. It >>> seems to me that slices are loosely defined, and I think thats a good >>> thing. If you think of slices as being similar to tags, then its easy >>> to imagine that any given shard can belong to many different slices. >> >> I wouldn't call it a "slice" but I've also been thinking about how to >> select groups of nodes. >> Extending that to shards would also make sense. > > I think the important points here are that if there is the concept of > a group (or slice or subcollection or tag - whatever terminology we > end up using), then > > 1) the client (typically some front end code) can use a simpler > interface, which I think is a good thing. Solr doesn't need to expose > how many shards there really are, or what they're named, and the FE > doesn't have to try and generate a list of shard id's just to do a > search. > > 2) Some piece of code has to decide what shards to actually search, > and that piece of code has to know exactly what shards actually exist. > If that decision is made in the client, then it has to be made in > every client (your customer-facing search interface, any and all > background tasks you have running, any ad-hoc searches you do for > analysis or spot checking or...). For the sake of simplicity and > sanity, you don't want to have to replicate that decision making code > across multiple apps or languages. > > 3) the collection and shard entities are at opposite ends of a fairly > wide divide, and there are cases where you need something > "in-between". > > In most cases, a simple collection search will suffice, but in those > cases where you want to limit the search to particular shards, it > makes more sense to me to manage that set of shards within solr, and > expose only the fact that the "groups" are available. > > Here's another example: > > Lets say you're generating hourly shards, to limit the maximum size of > the shard that is taking updates, for performance reasons. Lets also > assume that you want to roll those hourlies up into daily or weekly or > maximum size shards once they become less active, so Solr isn't trying > to search 24 shards to get a single days worth of results. If the > "group" concept exists, then you can hide all of the mechanics of how > and when that happens from the client, while still allowing it to have > some control over how far back it can search, by exposing "groups" > that limit it to the last day or week or whatever makes sense for your > app. > > cheers > > Jon > > >> >> -Yonik >> http://www.lucidimagination.com >> >