Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits:
1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman < jhell...@innoventsolutions.com> wrote: > Roman, > > Could you be more specific as to why replication doesn't meet your > requirements? It was geared explicitly for this purpose, including the > automatic discovery of changes to the data on the index master. > > Jason > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > > > OK, so I have verified the two instances can run alongside, sharing the > > same datadir > > > > All update handlers are unaccessible in the read-only master > > > > <updateHandler class="solr.DirectUpdateHandler2" > > enable="${solr.can.write:true}"> > > > > java -Dsolr.can.write=false ..... > > > > And I can reload the index manually: > > > > curl " > > > http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1 > > " > > > > But this is not an ideal solution; I'd like for the read-only server to > > discover index changes on its own. Any pointers? > > > > Thanks, > > > > roman > > > > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > >> Hello, > >> > >> I need your expert advice. I am thinking about running two instances of > >> solr that share the same datadirectory. The *reason* being: indexing > >> instance is constantly building cache after every commit (we have a big > >> cache) and this slows it down. But indexing doesn't need much RAM, only > the > >> search does (and server has lots of CPUs) > >> > >> So, it is like having two solr instances > >> > >> 1. solr-indexing-master > >> 2. solr-read-only-master > >> > >> In the solrconfig.xml I can disable update components, It should be > fine. > >> However, I don't know how to 'trigger' index re-opening on (2) after the > >> commit happens on (1). > >> > >> Ideally, the second instance could monitor the disk and re-open disk > after > >> new files appear there. Do I have to implement custom > IndexReaderFactory? > >> Or something else? > >> > >> Please note: I know about the replication, this usecase is IMHO slightly > >> different - in fact, write-only-master (1) is also a replication master > >> > >> Googling turned out only this > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - > no > >> pointers there. > >> > >> But If I am approaching the problem wrongly, please don't hesitate to > >> 're-educate' me :) > >> > >> Thanks! > >> > >> roman > >> > >