Hi Peter, Thank you, I am glad to read that this usecase is not alien.
I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com> wrote: > Hi, > We use this very same scenario to great effect - 2 instances using the same > dataDir with many cores - 1 is a writer (no caching), the other is a > searcher (lots of caching). > To get the searcher to see the index changes from the writer, you need the > searcher to do an empty commit - i.e. you invoke a commit with 0 documents. > This will refresh the caches (including autowarming), [re]build the > relevant searchers etc. and make any index changes visible to the RO > instance. > Also, make sure to use <lockType>native</lockType> in solrconfig.xml to > ensure the two instances don't try to commit at the same time. > There are several ways to trigger a commit: > Call commit() periodically within your own code. > Use autoCommit in solrconfig.xml. > Use an RPC/IPC mechanism between the 2 instance processes to tell the > searcher the index has changed, then call commit when called (more complex > coding, but good if the index changes on an ad-hoc basis). > Note, doing things this way isn't really suitable for an NRT environment. > > HTH, > Peter > > > > On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > Replication is fine, I am going to use it, but I wanted it for instances > > *distributed* across several (physical) machines - but here I have one > > physical machine, it has many cores. I want to run 2 instances of solr > > because I think it has these benefits: > > > > 1) I can give less RAM to the writer (4GB), and use more RAM for the > > searcher (28GB) > > 2) I can deactivate warming for the writer and keep it for the searcher > > (this considerably speeds up indexing - each time we commit, the server > is > > rebuilding a citation network of 80M edges) > > 3) saving disk space and better OS caching (OS should be able to use more > > RAM for the caching, which should result in faster operations - the two > > processes are accessing the same index) > > > > Maybe I should just forget it and go with the replication, but it doesn't > > 'feel right' IFF it is on the same physical machine. And Lucene > > specifically has a method for discovering changes and re-opening the > index > > (DirectoryReader.openIfChanged) > > > > Am I not seeing something? > > > > roman > > > > > > > > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman < > > jhell...@innoventsolutions.com> wrote: > > > > > Roman, > > > > > > Could you be more specific as to why replication doesn't meet your > > > requirements? It was geared explicitly for this purpose, including the > > > automatic discovery of changes to the data on the index master. > > > > > > Jason > > > > > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > > > > > > > OK, so I have verified the two instances can run alongside, sharing > the > > > > same datadir > > > > > > > > All update handlers are unaccessible in the read-only master > > > > > > > > <updateHandler class="solr.DirectUpdateHandler2" > > > > enable="${solr.can.write:true}"> > > > > > > > > java -Dsolr.can.write=false ..... > > > > > > > > And I can reload the index manually: > > > > > > > > curl " > > > > > > > > > > http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1 > > > > " > > > > > > > > But this is not an ideal solution; I'd like for the read-only server > to > > > > discover index changes on its own. Any pointers? > > > > > > > > Thanks, > > > > > > > > roman > > > > > > > > > > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com> > > > wrote: > > > > > > > >> Hello, > > > >> > > > >> I need your expert advice. I am thinking about running two instances > > of > > > >> solr that share the same datadirectory. The *reason* being: indexing > > > >> instance is constantly building cache after every commit (we have a > > big > > > >> cache) and this slows it down. But indexing doesn't need much RAM, > > only > > > the > > > >> search does (and server has lots of CPUs) > > > >> > > > >> So, it is like having two solr instances > > > >> > > > >> 1. solr-indexing-master > > > >> 2. solr-read-only-master > > > >> > > > >> In the solrconfig.xml I can disable update components, It should be > > > fine. > > > >> However, I don't know how to 'trigger' index re-opening on (2) after > > the > > > >> commit happens on (1). > > > >> > > > >> Ideally, the second instance could monitor the disk and re-open disk > > > after > > > >> new files appear there. Do I have to implement custom > > > IndexReaderFactory? > > > >> Or something else? > > > >> > > > >> Please note: I know about the replication, this usecase is IMHO > > slightly > > > >> different - in fact, write-only-master (1) is also a replication > > master > > > >> > > > >> Googling turned out only this > > > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912- > > > no > > > >> pointers there. > > > >> > > > >> But If I am approaching the problem wrongly, please don't hesitate > to > > > >> 're-educate' me :) > > > >> > > > >> Thanks! > > > >> > > > >> roman > > > >> > > > > > > > > >