Re: Two instances of solr - the same datadir?

Roman Chyla Wed, 05 Jun 2013 09:08:19 -0700

Hi Peter,

Thank you, I am glad to read that this usecase is not alien.


I'd like to make the second instance (searcher) completely read-only, so I
have disabled all the components that can write.

(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl after
commit, or write some IndexReaderFactory that checks for changes

The problem with calling the 'core reload' - is that it seems lots of work
for just opening a new searcher, eeekkk...somewhere I read that it is cheap
to reload a core, but re-opening the index searches must be definitely
cheaper...

roman


On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com> wrote:

> Hi,
> We use this very same scenario to great effect - 2 instances using the same
> dataDir with many cores - 1 is a writer (no caching), the other is a
> searcher (lots of caching).
> To get the searcher to see the index changes from the writer, you need the
> searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
> This will refresh the caches (including autowarming), [re]build the
> relevant searchers etc. and make any index changes visible to the RO
> instance.
> Also, make sure to use <lockType>native</lockType> in solrconfig.xml to
> ensure the two instances don't try to commit at the same time.
> There are several ways to trigger a commit:
> Call commit() periodically within your own code.
> Use autoCommit in solrconfig.xml.
> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> searcher the index has changed, then call commit when called (more complex
> coding, but good if the index changes on an ad-hoc basis).
> Note, doing things this way isn't really suitable for an NRT environment.
>
> HTH,
> Peter
>
>
>
> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
>
> > Replication is fine, I am going to use it, but I wanted it for instances
> > *distributed* across several (physical) machines - but here I have one
> > physical machine, it has many cores. I want to run 2 instances of solr
> > because I think it has these benefits:
> >
> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> > searcher (28GB)
> > 2) I can deactivate warming for the writer and keep it for the searcher
> > (this considerably speeds up indexing - each time we commit, the server
> is
> > rebuilding a citation network of 80M edges)
> > 3) saving disk space and better OS caching (OS should be able to use more
> > RAM for the caching, which should result in faster operations - the two
> > processes are accessing the same index)
> >
> > Maybe I should just forget it and go with the replication, but it doesn't
> > 'feel right' IFF it is on the same physical machine. And Lucene
> > specifically has a method for discovering changes and re-opening the
> index
> > (DirectoryReader.openIfChanged)
> >
> > Am I not seeing something?
> >
> > roman
> >
> >
> >
> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> > jhell...@innoventsolutions.com> wrote:
> >
> > > Roman,
> > >
> > > Could you be more specific as to why replication doesn't meet your
> > > requirements?  It was geared explicitly for this purpose, including the
> > > automatic discovery of changes to the data on the index master.
> > >
> > > Jason
> > >
> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
> > >
> > > > OK, so I have verified the two instances can run alongside, sharing
> the
> > > > same datadir
> > > >
> > > > All update handlers are unaccessible in the read-only master
> > > >
> > > > <updateHandler class="solr.DirectUpdateHandler2"
> > > >                 enable="${solr.can.write:true}">
> > > >
> > > > java -Dsolr.can.write=false .....
> > > >
> > > > And I can reload the index manually:
> > > >
> > > > curl "
> > > >
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > > "
> > > >
> > > > But this is not an ideal solution; I'd like for the read-only server
> to
> > > > discover index changes on its own. Any pointers?
> > > >
> > > > Thanks,
> > > >
> > > >  roman
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com>
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I need your expert advice. I am thinking about running two instances
> > of
> > > >> solr that share the same datadirectory. The *reason* being: indexing
> > > >> instance is constantly building cache after every commit (we have a
> > big
> > > >> cache) and this slows it down. But indexing doesn't need much RAM,
> > only
> > > the
> > > >> search does (and server has lots of CPUs)
> > > >>
> > > >> So, it is like having two solr instances
> > > >>
> > > >> 1. solr-indexing-master
> > > >> 2. solr-read-only-master
> > > >>
> > > >> In the solrconfig.xml I can disable update components, It should be
> > > fine.
> > > >> However, I don't know how to 'trigger' index re-opening on (2) after
> > the
> > > >> commit happens on (1).
> > > >>
> > > >> Ideally, the second instance could monitor the disk and re-open disk
> > > after
> > > >> new files appear there. Do I have to implement custom
> > > IndexReaderFactory?
> > > >> Or something else?
> > > >>
> > > >> Please note: I know about the replication, this usecase is IMHO
> > slightly
> > > >> different - in fact, write-only-master (1) is also a replication
> > master
> > > >>
> > > >> Googling turned out only this
> > > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912-
> > > no
> > > >> pointers there.
> > > >>
> > > >> But If I am approaching the problem wrongly, please don't hesitate
> to
> > > >> 're-educate' me :)
> > > >>
> > > >> Thanks!
> > > >>
> > > >>  roman
> > > >>
> > >
> > >
> >
>

Re: Two instances of solr - the same datadir?

Reply via email to