Re: Concurrent Indexing and Searching in Solr.

Upayavira Sat, 08 Aug 2015 00:13:05 -0700

If you are using Python, then you can use urllib2, or "requests" which
is reportedly better, or better still something like pysolr, which makes
life simpler.


Here's a Pull Request that makes pysolr Zookeeper aware, which'll help
if you are using SolrCloud. I hope one day they will merge it:

https://github.com/toastdriven/pysolr/pull/138

Upayavira

On Fri, Aug 7, 2015, at 11:37 PM, Erick Erickson wrote:
> bq: So, How much minimum concurrent threads should I run?
> 
> I really can't answer that in the abstract, you'll simply have to
> test.
> 
> I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine
> that
> moving from Python to post.jar isn't all that useful.
> 
> But before you do anything, see what really happens when you remove th
> commit=true. That's likely way more important than the rest.
> 
> Best,
> Erick
> 
> On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki <nitinml...@gmail.com>
> wrote:
> > Hi Erick,
> >                 posting files to Solr via curl =>
> > Rather than posting files via curl. Which is better SolrJ or post.jar... I
> > don't use both things. I wrote a python script for indexing and using
> > urllib and urllib2 for indexing data via http.. I don't have any  option to
> > use SolrJ Right now. How can I do same thing via post.jar in python? Any
> > help Please.
> >
> > indexing with 100 threads is going to eat up a lot of CPU cycles
> > => So, How much minimum concurrent threads should I run? And I also need
> > concurrent searching. So, How much?
> >
> > And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
> > help me..
> >
> > On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> bq: How much limitations does Solr has related to indexing and searching
> >> simultaneously? It means that how many simultaneously calls, I made for
> >> searching and indexing once?
> >>
> >> None a-priori. It all depends on the hardware you're throwing at it.
> >> Obviously
> >> indexing with 100 threads is going to eat up a lot of CPU cycles that
> >> can't then
> >> be devoted to satisfying queries. You need to strike a balance. Do
> >> seriously
> >> consider using some other method than posting files to Solr via curl
> >> or the like,
> >> that's rarely a robust solution for production.
> >>
> >> As for adding the commit=true, this shouldn't be affecting the index size,
> >> I
> >> suspect you were mislead by something else happening.
> >>
> >> Really, remove it or you'll beat up your system hugely. As for the soft
> >> commit
> >> interval, that's totally irrelevant when you're committing every
> >> document. But do
> >> lengthen it as much as you can. Most of the time when people say "real
> >> time",
> >> it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
> >> what the _real_ requirement is, it's often not what's stated.
> >>
> >> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> >> indexing and searching data.
> >>
> >> Did you read the link I provided? With replicas, 5.2 will index almost
> >> twice as
> >> fast. That means (roughly) half the work on the followers is being done,
> >> freeing up cycles for performing queries.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki <nitinml...@gmail.com>
> >> wrote:
> >> > Hi Erick,
> >> >               You said that soft commit should be more than 3000 ms.
> >> > Actually, I need Real time searching and that's why I need soft commit
> >> fast.
> >> >
> >> > commit=true => I made commit=true because , It reduces by indexed data
> >> size
> >> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
> >> > indexed data size was 1.5GB. After changing it to commit=true, then size
> >> > reduced to 500MB only. I am not getting how is it?
> >> >
> >> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> >> > indexing and searching data.
> >> >
> >> > How much limitations does Solr has related to indexing and searching
> >> > simultaneously? It means that how many simultaneously calls, I made for
> >> > searching and indexing once?
> >> >
> >> >
> >> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson <erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> Your soft commit time of 3 seconds is quite aggressive,
> >> >> I'd lengthen it to as long as possible.
> >> >>
> >> >> Ugh, looked at your query more closely. Adding commit=true to every
> >> update
> >> >> request is horrible performance wise. Let your autocommit process
> >> >> handle the commits is the first thing I'd do. Second, I'd try going to
> >> >> SolrJ
> >> >> and batching up documents (I usually start with 1,000) or using the
> >> >> post.jar
> >> >> tool rather than sending them via a raw URL.
> >> >>
> >> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
> >> >> version of Solr?
> >> >> There was a 2x speedup in Solr 5.2, see:
> >> >>
> >> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >> >>
> >> >> One symptom was that the followers were doing waaaaay more work than the
> >> >> leader
> >> >> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
> >> >> which will
> >> >> affect query response rates.
> >> >>
> >> >> Basically, if query response is paramount, you really need to throttle
> >> >> your indexing,
> >> >> there's just a whole lot of work going on here..
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira <u...@odoko.co.uk> wrote:
> >> >> > How many CPUs do you have? 100 concurrent indexing calls seems like
> >> >> > rather a lot. You're gonna end up doing a lot of context switching,
> >> >> > hence degraded performance. Dunno what others would say, but I'd aim
> >> for
> >> >> > approx one indexing thread per CPU.
> >> >> >
> >> >> > Upayavira
> >> >> >
> >> >> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
> >> >> >> Hello Everyone,
> >> >> >>                           I have indexed 16 million documents in Solr
> >> >> >> Cloud. Created 4 nodes and 8 shards with single replica.
> >> >> >> I am trying to make concurrent indexing and searching on those
> >> indexed
> >> >> >> documents. Trying to make 100 concurrent indexing calls along with
> >> 100
> >> >> >> concurrent searching calls.
> >> >> >> It *degrades searching and indexing* performance both.
> >> >> >>
> >> >> >> Configuration :
> >> >> >>
> >> >> >>       "commitWithin":{"softCommit":true},
> >> >> >>       "autoCommit":{
> >> >> >>         "maxDocs":-1,
> >> >> >>         "maxTime":60000,
> >> >> >>         "openSearcher":false},
> >> >> >>       "autoSoftCommit":{
> >> >> >>         "maxDocs":-1,
> >> >> >>         "maxTime":3000}},
> >> >> >>
> >> >> >>       "indexConfig":{
> >> >> >>       "maxBufferedDocs":-1,
> >> >> >>       "maxMergeDocs":-1,
> >> >> >>       "maxIndexingThreads":8,
> >> >> >>       "mergeFactor":-1,
> >> >> >>       "ramBufferSizeMB":100.0,
> >> >> >>       "writeLockTimeout":-1,
> >> >> >>       "lockType":"native"}}}
> >> >> >>
> >> >> >> AND  <maxWarmingSearchers>2</maxWarmingSearchers>
> >> >> >>
> >> >> >> I don't have know that how master and slave works. Normally, I
> >> created 8
> >> >> >> shards and indexed documents using :
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
> >> >> >> <http://localhost:8983/solr/test_commit_fast/update/json?commit=true
> >> >
> >> >> -H
> >> >> >> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
> >> >> >> using
> >> >> >> *: http://localhost:8983/solr/test_commit_fast/select
> >> >> >> <http://localhost:8983/solr/test_commit_fast/select>*?q=<
> >> field_name:
> >> >> >> search_string>
> >> >> >>
> >> >> >> Please any help on it. To make searching and indexing fast
> >> concurrently.
> >> >> >> Thanks.
> >> >> >>
> >> >> >>
> >> >> >> Regards,
> >> >> >> Nitin
> >> >>
> >>

Re: Concurrent Indexing and Searching in Solr.

Reply via email to