If you are using Python, then you can use urllib2, or "requests" which is reportedly better, or better still something like pysolr, which makes life simpler.
Here's a Pull Request that makes pysolr Zookeeper aware, which'll help if you are using SolrCloud. I hope one day they will merge it: https://github.com/toastdriven/pysolr/pull/138 Upayavira On Fri, Aug 7, 2015, at 11:37 PM, Erick Erickson wrote: > bq: So, How much minimum concurrent threads should I run? > > I really can't answer that in the abstract, you'll simply have to > test. > > I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine > that > moving from Python to post.jar isn't all that useful. > > But before you do anything, see what really happens when you remove th > commit=true. That's likely way more important than the rest. > > Best, > Erick > > On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki <nitinml...@gmail.com> > wrote: > > Hi Erick, > > posting files to Solr via curl => > > Rather than posting files via curl. Which is better SolrJ or post.jar... I > > don't use both things. I wrote a python script for indexing and using > > urllib and urllib2 for indexing data via http.. I don't have any option to > > use SolrJ Right now. How can I do same thing via post.jar in python? Any > > help Please. > > > > indexing with 100 threads is going to eat up a lot of CPU cycles > > => So, How much minimum concurrent threads should I run? And I also need > > concurrent searching. So, How much? > > > > And Thanks for solr 5.2, I will go through that. Thanking for reply. Please > > help me.. > > > > On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> bq: How much limitations does Solr has related to indexing and searching > >> simultaneously? It means that how many simultaneously calls, I made for > >> searching and indexing once? > >> > >> None a-priori. It all depends on the hardware you're throwing at it. > >> Obviously > >> indexing with 100 threads is going to eat up a lot of CPU cycles that > >> can't then > >> be devoted to satisfying queries. You need to strike a balance. Do > >> seriously > >> consider using some other method than posting files to Solr via curl > >> or the like, > >> that's rarely a robust solution for production. > >> > >> As for adding the commit=true, this shouldn't be affecting the index size, > >> I > >> suspect you were mislead by something else happening. > >> > >> Really, remove it or you'll beat up your system hugely. As for the soft > >> commit > >> interval, that's totally irrelevant when you're committing every > >> document. But do > >> lengthen it as much as you can. Most of the time when people say "real > >> time", > >> it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check > >> what the _real_ requirement is, it's often not what's stated. > >> > >> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding > >> indexing and searching data. > >> > >> Did you read the link I provided? With replicas, 5.2 will index almost > >> twice as > >> fast. That means (roughly) half the work on the followers is being done, > >> freeing up cycles for performing queries. > >> > >> Best, > >> Erick > >> > >> > >> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki <nitinml...@gmail.com> > >> wrote: > >> > Hi Erick, > >> > You said that soft commit should be more than 3000 ms. > >> > Actually, I need Real time searching and that's why I need soft commit > >> fast. > >> > > >> > commit=true => I made commit=true because , It reduces by indexed data > >> size > >> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my > >> > indexed data size was 1.5GB. After changing it to commit=true, then size > >> > reduced to 500MB only. I am not getting how is it? > >> > > >> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding > >> > indexing and searching data. > >> > > >> > How much limitations does Solr has related to indexing and searching > >> > simultaneously? It means that how many simultaneously calls, I made for > >> > searching and indexing once? > >> > > >> > > >> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson <erickerick...@gmail.com> > >> > wrote: > >> > > >> >> Your soft commit time of 3 seconds is quite aggressive, > >> >> I'd lengthen it to as long as possible. > >> >> > >> >> Ugh, looked at your query more closely. Adding commit=true to every > >> update > >> >> request is horrible performance wise. Let your autocommit process > >> >> handle the commits is the first thing I'd do. Second, I'd try going to > >> >> SolrJ > >> >> and batching up documents (I usually start with 1,000) or using the > >> >> post.jar > >> >> tool rather than sending them via a raw URL. > >> >> > >> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what > >> >> version of Solr? > >> >> There was a 2x speedup in Solr 5.2, see: > >> >> > >> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ > >> >> > >> >> One symptom was that the followers were doing waaaaay more work than the > >> >> leader > >> >> (BTW, using master/slave when talking SolrCloud is a bit confusing...) > >> >> which will > >> >> affect query response rates. > >> >> > >> >> Basically, if query response is paramount, you really need to throttle > >> >> your indexing, > >> >> there's just a whole lot of work going on here.. > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira <u...@odoko.co.uk> wrote: > >> >> > How many CPUs do you have? 100 concurrent indexing calls seems like > >> >> > rather a lot. You're gonna end up doing a lot of context switching, > >> >> > hence degraded performance. Dunno what others would say, but I'd aim > >> for > >> >> > approx one indexing thread per CPU. > >> >> > > >> >> > Upayavira > >> >> > > >> >> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: > >> >> >> Hello Everyone, > >> >> >> I have indexed 16 million documents in Solr > >> >> >> Cloud. Created 4 nodes and 8 shards with single replica. > >> >> >> I am trying to make concurrent indexing and searching on those > >> indexed > >> >> >> documents. Trying to make 100 concurrent indexing calls along with > >> 100 > >> >> >> concurrent searching calls. > >> >> >> It *degrades searching and indexing* performance both. > >> >> >> > >> >> >> Configuration : > >> >> >> > >> >> >> "commitWithin":{"softCommit":true}, > >> >> >> "autoCommit":{ > >> >> >> "maxDocs":-1, > >> >> >> "maxTime":60000, > >> >> >> "openSearcher":false}, > >> >> >> "autoSoftCommit":{ > >> >> >> "maxDocs":-1, > >> >> >> "maxTime":3000}}, > >> >> >> > >> >> >> "indexConfig":{ > >> >> >> "maxBufferedDocs":-1, > >> >> >> "maxMergeDocs":-1, > >> >> >> "maxIndexingThreads":8, > >> >> >> "mergeFactor":-1, > >> >> >> "ramBufferSizeMB":100.0, > >> >> >> "writeLockTimeout":-1, > >> >> >> "lockType":"native"}}} > >> >> >> > >> >> >> AND <maxWarmingSearchers>2</maxWarmingSearchers> > >> >> >> > >> >> >> I don't have know that how master and slave works. Normally, I > >> created 8 > >> >> >> shards and indexed documents using : > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true > >> >> >> <http://localhost:8983/solr/test_commit_fast/update/json?commit=true > >> > > >> >> -H > >> >> >> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching > >> >> >> using > >> >> >> *: http://localhost:8983/solr/test_commit_fast/select > >> >> >> <http://localhost:8983/solr/test_commit_fast/select>*?q=< > >> field_name: > >> >> >> search_string> > >> >> >> > >> >> >> Please any help on it. To make searching and indexing fast > >> concurrently. > >> >> >> Thanks. > >> >> >> > >> >> >> > >> >> >> Regards, > >> >> >> Nitin > >> >> > >>