Increasing the polling interval does help. But the requirement is to get a document indexed and searchable instantly ( sounds like RTS), 30 sec is acceptable.I need to look at Solr NRT and cloud.
I created a new core to accept daily updates and replicate every 10 sec. Two other cores with 234 Million documents are configured to replicate only once a day. I am feeding all three cores but two big cores are not replicating. While searching I am running a group.field on my unique id and taking the most updated one. Right now it looks fine.Every day I am going to delete the last day's records from the daily update. I am planning to use rsync for replication, it will be fusion IO to fusion IO , so hopefully will be very fast. What do you think ? We use windows service ( written in dot net C#) to feed the data using REST call. That is really fast , we can feed more than 15 Million data in a day to two cores easily. I am using solr config autocommit = 5 sec I could not figure out how I was able to achieve those numbers in my test environment, all configuration were same except I had lot less memory in test ! I am trying to find out what I am missing in other configuration. My SLES kernel version is different in production, its a 3.0.* , test was 2.6.* but I do not think that can cause a problem. Thank you again, Mou On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] < ml-node+s472066n3995861...@n3.nabble.com> wrote: > Replication will indeed be incremental. But if you commit too often (and > committing too often a common mistake) then the merging will > eventually merge everything into new segments and the whole thing will > be replicated. > > Additionally, optimizing (or forceMerge in 4.x) will make a single segment > and force the entire index to replicate. > > You should emphatically _not_ have to have two cores. Solr is built to > handle replication etc. I suspect your committing too often or some > other mis-configuration and you're creating a problem for yourself. > > Here's what I'd do: > 1> increase the polling interval to, say, 10 minutes (or however long you > can > live with stale data) on the slave. > > 2> decrease the commits you're doing. This could involve the autocommit > options > you might have set in solrconfig.xml. It could be your client (don't > know how you're > indexing, solrJ?) and the commitWithin parameter. Could be you're > optimizing (if you > are, stop it!). > > Note that ramBufferSizeMB has no influence on how often things are > _committed_. > When this limit is exceeded, the accumulated indexing data is written > to the currently-open > segment. Multiple flushes can go to the _same_ segment. The write-once > nature of > segments means that after a segment is closed (through a commit), it > is not changed. But > a segment that is not closed may be written to multiple times until it's > closed. > > HTH > Erick > > On Wed, Jul 18, 2012 at 1:25 PM, Mou <[hidden > email]<http://user/SendEmail.jtp?type=node&node=3995861&i=0>> > wrote: > > > Hi Eric, > > > > I totally agree. That's what I also figured ultimately. One thing I am > not > > clear. The replication is supposed to be incremental ? But looks like > it > > is trying to replicate the whole index. May be I am changing the index > so > > frequently, it is triggering auto merge and a full replication ? I am > > thinking in right direction? > > > > I see that when I start the solr search instance before I start feeding > the > > solr Index, my searches are fine BUT it is using the old searcher so I > am > > not seeing the updates in the result. > > > > So now I am trying to change my architecture. I am going to have a core > > dedicated to receive daily updates, which is going to be 5 million docs > and > > size is going to be little less than 5 G, which is small and replication > > will be faster? > > > > I will search both the cores i.e. old data and the daily updates and do > a > > field collapsing on my unique id so that I do not return duplicate > results > > .I haven't tried grouping results ; so not sure about the performance. > Any > > suggestion ? > > > > Eventually I have to use Solr trunk like you suggested. > > > > Thank you for your help, > > > > On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] < > > [hidden email] <http://user/SendEmail.jtp?type=node&node=3995861&i=1>> > wrote: > > > >> bq: This index is only used for searching and being replicated every 7 > sec > >> from > >> the master. > >> > >> This is a red-flag. 7 second replication times are likely forcing your > >> app to spend > >> all its time opening new searchers. Your cached filter queries are > >> likely rarely being re-used > >> because they're being thrown away every 7 seconds. This assumes you're > >> changing your master index frequently. > >> > >> If you need near real time, consider Solr trunk and SolrCloud, but > >> trying to simulate > >> NRT with very short replication intervals is usually a bad idea. > >> > >> A quick test would be to disable replication for a bit (or lengthen it > >> to, say, 10 minutes) > >> > >> Best > >> Erick > >> > >> On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi <[hidden email]< > http://user/SendEmail.jtp?type=node&node=3995754&i=0>> > >> wrote: > >> > >> > > >> >> FWIW, when asked at what point one would want to split JVMs and > shard, > >> >> on the same machine, Grant Ingersoll mentioned 16GB, and precisely > for > >> >> GC cost reasons. You're way above that. > >> > > >> > - his index is 75G, and Grant mentioned RAM heap size; we can use > >> terabytes > >> > of index with 16Gb memory. > >> > > >> > > >> > > >> > > >> > > >> > >> > >> ------------------------------ > >> If you reply to this email, your message will be added to the > discussion > >> below: > >> > >> > http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html > >> To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow > search, click > >> here< > > >> . > >> NAML< > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > >> > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995861.html > To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, > click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3995436&code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995873.html Sent from the Solr - User mailing list archive at Nabble.com.