Hi mike,

I think I wasn't clear,

Each document will only be tagged with one user_id, or to be specific
one tenant_id. Users of the same tenant can't upload the same document
to the same path.

So I use this to make the key unique for each tenant. So I can index,
delete without a problem.

On Wed, Oct 27, 2010 at 5:50 PM, mike anderson <saidthero...@gmail.com> wrote:
> Tagging every document with a few hundred thousand 6 character user-ids
> would  increase the document size by two orders of magnitude. I can't
> imagine why this wouldn't mean the index would increase by just as much
> (though I really don't know much about that file structure). By my simple
> math, this would mean that if we want each shard's index to be able to fit
> in memory, then (even with some beefy servers) each query would have to go
> out to a few thousand shards (as opposed to 21 if we used the MultiCore
> approach). This means the typical response time would be much slower.
>
>
> -mike
>
> On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind <rochk...@jhu.edu>wrote:
>
>> mike anderson wrote:
>>
>>> I'm really curious if there is a clever solution to the obvious problem
>>> with: "So your better off using a single index and with a user id and use
>>> a query filter with the user id when fetching data.", i.e.. when you have
>>> hundreds of thousands of user IDs tagged on each article. That just
>>> doesn't
>>> sound like it scales very well..
>>>
>>>
>> Actually, I think that design would scale pretty fine, I don't think
>> there's an 'obvious' problem. You store your userIDs in a multi-valued field
>> (or as multiple terms in a single value, ends up being similar). You fq on
>> there with the current userID.   There's one way to find out of course, but
>> that doesn't seem a patently ridiculous scenario or anything, that's the
>> kind of thing Solr is generally good at, it's what it's built for.   The
>> problem might actually be in the time it takes to add such a document to the
>> index; but not in query time.
>>
>> Doesn't mean it's the best solution for your problem though, I can't say.
>>
>> My impression is that Solr in general isn't really designed to support the
>> kind of multi-tenancy use case people are talking about lately.  So trying
>> to make it work anyway... if multi-cores work for you, then great, but be
>> aware they weren't really designed for that (having thousands of cores) and
>> may not. If a single index can work for you instead, great, but as you've
>> discovered it's not neccesarily obvious how to set up the schema to do what
>> you need -- really this applies to Solr in general, unlike an rdbms where
>> you just third-form-normalize everything and figure it'll work for almost
>> any use case that comes up,  in Solr you generally need to custom fit the
>> schema for your particular use cases, sometimes being kind of clever to
>> figure out the optimal way to do that.
>>
>> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
>> index takes more intellectual work than setting up an rdbms. The trade off
>> is you get speed, and flexible ways to set up relevancy (that still perform
>> well). Took a couple decades for rdbms to get as brainless to use as they
>> are, maybe in a couple more we'll have figured out ways to make indexing
>> engines like solr equally brainless, but not yet -- but it's still pretty
>> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>>
>



-- 
Regards,

Tharindu

Reply via email to