Actually, it's one lucene segment per *concurrent* indexing thread. So if you have 10 indexing threads in Lucene at once, then 10 in-memory segments will be created and will have to be written on refresh/commit.
Elasticsearch uses a bounded thread pool to service all indexing requests, which I think is a healthy approach. It shouldn't have to be the client's job to worry about server side details like this. Mike McCandless http://blog.mikemccandless.com On Thu, Nov 2, 2017 at 5:23 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Nawab, > > > One indexing thread in lucene corresponds to one segment being written. > I need a fine control on the number of segments. > > I didn’t check the code, but I would be surprised that it is how things > work. It can appear that it is working like that if each client thread is > doing commits. Is that the case? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 1 Nov 2017, at 18:00, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > > > > Well, the reason i want to control number of indexing threads is to > > restrict number of "segments" being created at one time in the RAM. One > > indexing thread in lucene corresponds to one segment being written. I > need > > a fine control on the number of segments. Less than that, and I will not > be > > fully utilizing my writing capacity. On the other hand, if I have more > > threads, then I will end up a lot more segments of small size, which I > will > > need to flush frequently and then merge, and that will cause a different > > kind of problem. > > > > Your suggestion will require me and other such solr users to create a > tight > > coupling between the clients and the Solr servers. My client is not SolrJ > > based. IN a scenario when I am connecting and indexing to Solr remotely, > I > > want more requests to be waiting on the solr side so that they start > > writing as soon as an Indexing thread is available, vs waiting on my > client > > side - on the other side of the wire. > > > > Thanks > > Nawab > > > > On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey <apa...@elyograg.org> > wrote: > > > >> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote: > >> > >>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 > while > >>> migrating to solr6 and locally working around it in Lucene code. I am > >>> thinking to fix it properly and hopefully patch back to Solr. Since, > >>> Lucene > >>> code does not want to keep any such config, I am thinking to use a > >>> counting > >>> semaphore in Solr code before calling IndexWriter.addDocument(s) or > >>> IndexWriter.updateDocument(s). > >>> > >> > >> There's a fairly simple way to control the number of indexing threads > that > >> doesn't require ANY changes to Solr: Don't start as many > threads/processes > >> on your indexing client(s). If you control the number of simultaneous > >> requests sent to Solr, then Solr won't start as many indexing threads. > >> That kind of control over your indexing system is something that's > always > >> preferable to have. > >> > >> Thanks, > >> Shawn > >> > >