Re: Using Multiple Cores for Multiple Users

Lance Norskog Tue, 09 Nov 2010 20:00:40 -0800

Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.


There is no quick calculation for "total frequency for terms only in
these documents". Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon <gear...@sbcglobal.net> wrote:
> hmmmmm, relevance is before filtering, probably during indexing?
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a 
> better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: Lance Norskog <goks...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 7:07:45 PM
> Subject: Re: Using Multiple Cores for Multiple Users
>
> There is a standard problem with this: relevance is determined from
> all of the words in a field of all documents, not just the documents
> that match the query. That is, when user A searches for 'monkeys' and
> one of his feeds has a document with this word, but someone else is a
> zoophile, 'monkeys' will be a common word in the index. This will skew
> the relevance computation for user A.
>
> You could have a separate text field for each user. This might work
> better- but you can't use field norms (they take up space for all
> documents).
>
> Lance
>
> On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
> <estrada.adam.gro...@gmail.com> wrote:
>> Thanks a lot for all the tips, guys! I think that we may explore both
>> options just to see what happens. I'm sure that scalability will be a huge
>> mess with the core-per-user scenario. I like the idea of creating a user ID
>> field and agree that it's probably the best approach. We'll see...I will be
>> sure to let the list know what I find! Please don't stop posting your
>> comments everyone ;-) My inquiring mind wants to know...
>>
>> Adam
>>
>> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
>>
>>> If storing in a single index (possibly sharded if you need it), you can
>>> simply include a solr field that specifies the user ID of the saved thing.
>>> On the client side, in your application, simply ensure that there is an fq
>>> parameter limiting to the current user, if you want to limit to the current
>>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>>> cores', there is no relevancy issue.
>>>
>>> It IS true that when your index gets very large, commits will start taking
>>> longer, which can be a problem. I don't mean commits will take longer just
>>> because there is more stuff to commit -- the larger the index, the longer an
>>> update to a single document will take to commit.
>>>
>>> In general, i suspect that having dozens or hundreds (or thousands!) of
>>> cores is not going to scale well, it is not going to make good use of your
>>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>>
>>> However, you are probably going to run into some issues with the single
>>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>>> an oft-asked question that there doesn't seem to be any "just works and does
>>> everything for you without needing to think about it" solution for in solr.
>>> Judging from past thread. I am not a Solr developer or expert.
>>>
>>> ________________________________________
>>> From: Markus Jelsma [markus.jel...@openindex.io]
>>> Sent: Tuesday, November 09, 2010 6:57 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Adam Estrada
>>> Subject: Re: Using Multiple Cores for Multiple Users
>>>
>>> Hi,
>>>
>>> > All,
>>> >
>>> > I have a web application that requires the user to register and then
>>> login
>>> > to gain access to the site. Pretty standard stuff...Now I would like to
>>> > know what the best approach would be to implement a "customized" search
>>> > experience for each user. Would this mean creating a separate core per
>>> > user? I think that this is not possible without restarting Solr after
>>> each
>>> > core is added to the multi-core xml file, right?
>>>
>>> No, you can dynamically manage cores and parts of their configuration.
>>> Sometimes you must reindex after a change, the same is true for reloading
>>> cores. Check the wiki on this one [1].
>>>
>>> >
>>> > My use case is this...User A would like to index 5 RSS feeds and User B
>>> > would like to index 5 completely different RSS feeds and he is not
>>> > interested at all in what User A is interested in. This means that they
>>> > would have to be separate index cores, right?
>>>
>>> If you view documents within an rss feed as a separate documents, you can
>>> assign an user ID to those documents, creating a multi user index with rss
>>> documents per user, or group or whatever.
>>>
>>> Having a core per user isn't a good idea if you have many users.  It takes
>>> up
>>> additional memory and disk space, doesn't share caches etc.  There is also
>>> more maintenance and your need some support scripts to dynamically create
>>> new
>>> cores - Solr currently doesn't create a new core directory structure.
>>>
>>> But, reindexing a very large index takes up a lot more time and resources
>>> and
>>> relevancy might be an issue depending on the rss feeds' contents.
>>>
>>> >
>>> > What is the best approach for this kind of thing?
>>>
>>> I'd usually store the feeds in a single index and shard if it's too many
>>> for a
>>> single server with your specifications. Unless the demands are too
>>> specific.
>>>
>>> >
>>> > Thanks in advance,
>>> > Adam
>>>
>>> [1]: http://wiki.apache.org/solr/CoreAdmin
>>>
>>> Cheers
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Using Multiple Cores for Multiple Users

Reply via email to