Replication will indeed be incremental. But if you commit too often (and committing too often a common mistake) then the merging will eventually merge everything into new segments and the whole thing will be replicated.
Additionally, optimizing (or forceMerge in 4.x) will make a single segment and force the entire index to replicate. You should emphatically _not_ have to have two cores. Solr is built to handle replication etc. I suspect your committing too often or some other mis-configuration and you're creating a problem for yourself. Here's what I'd do: 1> increase the polling interval to, say, 10 minutes (or however long you can live with stale data) on the slave. 2> decrease the commits you're doing. This could involve the autocommit options you might have set in solrconfig.xml. It could be your client (don't know how you're indexing, solrJ?) and the commitWithin parameter. Could be you're optimizing (if you are, stop it!). Note that ramBufferSizeMB has no influence on how often things are _committed_. When this limit is exceeded, the accumulated indexing data is written to the currently-open segment. Multiple flushes can go to the _same_ segment. The write-once nature of segments means that after a segment is closed (through a commit), it is not changed. But a segment that is not closed may be written to multiple times until it's closed. HTH Erick On Wed, Jul 18, 2012 at 1:25 PM, Mou <mouna...@gmail.com> wrote: > Hi Eric, > > I totally agree. That's what I also figured ultimately. One thing I am not > clear. The replication is supposed to be incremental ? But looks like it > is trying to replicate the whole index. May be I am changing the index so > frequently, it is triggering auto merge and a full replication ? I am > thinking in right direction? > > I see that when I start the solr search instance before I start feeding the > solr Index, my searches are fine BUT it is using the old searcher so I am > not seeing the updates in the result. > > So now I am trying to change my architecture. I am going to have a core > dedicated to receive daily updates, which is going to be 5 million docs and > size is going to be little less than 5 G, which is small and replication > will be faster? > > I will search both the cores i.e. old data and the daily updates and do a > field collapsing on my unique id so that I do not return duplicate results > .I haven't tried grouping results ; so not sure about the performance. Any > suggestion ? > > Eventually I have to use Solr trunk like you suggested. > > Thank you for your help, > > On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] < > ml-node+s472066n3995754...@n3.nabble.com> wrote: > >> bq: This index is only used for searching and being replicated every 7 sec >> from >> the master. >> >> This is a red-flag. 7 second replication times are likely forcing your >> app to spend >> all its time opening new searchers. Your cached filter queries are >> likely rarely being re-used >> because they're being thrown away every 7 seconds. This assumes you're >> changing your master index frequently. >> >> If you need near real time, consider Solr trunk and SolrCloud, but >> trying to simulate >> NRT with very short replication intervals is usually a bad idea. >> >> A quick test would be to disable replication for a bit (or lengthen it >> to, say, 10 minutes) >> >> Best >> Erick >> >> On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi <[hidden >> email]<http://user/SendEmail.jtp?type=node&node=3995754&i=0>> >> wrote: >> >> > >> >> FWIW, when asked at what point one would want to split JVMs and shard, >> >> on the same machine, Grant Ingersoll mentioned 16GB, and precisely for >> >> GC cost reasons. You're way above that. >> > >> > - his index is 75G, and Grant mentioned RAM heap size; we can use >> terabytes >> > of index with 16Gb memory. >> > >> > >> > >> > >> > >> >> >> ------------------------------ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html >> To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, >> click >> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3995436&code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw> >> . >> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html > Sent from the Solr - User mailing list archive at Nabble.com.