Re: Two instances of solr - the same datadir?

Roman Chyla Wed, 05 Jun 2013 10:44:15 -0700

So here it is for a record how I am solving it right now:

Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false



solrconfig.xml changes:

1. all index changing components have this bit,
enable="${montysolr.master:true}" - ie.

<updateHandler class="solr.DirectUpdateHandler2"
                 enable="${montysolr.master:true}">

2. for cache warming de/activation

<listener event="newSearcher"
      class="solr.QuerySenderListener"
      enable="${montysolr.enable.warming:true}">...

3. to trigger refresh of the read-only-master (from write-master):

    <listener event="postCommit"
      class="solr.RunExecutableListener"
      enable="${montysolr.master:true}">
      <str name="exe">curl</str>
      <str name="dir">.</str>
      <bool name="wait">false</bool>
      <arr name="args"> <str>${montysolr.read.master:http://localhost
}/solr/admin/cores?wt=json&amp;action=RELOAD&amp;core=collection1</str></arr>
    </listener>

This works, I still don't like the reload of the whole core, but it seems
like the easiest thing to do now.

-- roman


On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> Hi Peter,
>
> Thank you, I am glad to read that this usecase is not alien.
>
> I'd like to make the second instance (searcher) completely read-only, so I
> have disabled all the components that can write.
>
> (being lazy ;)) I'll probably use
> http://wiki.apache.org/solr/CollectionDistribution to call the curl after
> commit, or write some IndexReaderFactory that checks for changes
>
> The problem with calling the 'core reload' - is that it seems lots of work
> for just opening a new searcher, eeekkk...somewhere I read that it is cheap
> to reload a core, but re-opening the index searches must be definitely
> cheaper...
>
> roman
>
>
> On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com>wrote:
>
>> Hi,
>> We use this very same scenario to great effect - 2 instances using the
>> same
>> dataDir with many cores - 1 is a writer (no caching), the other is a
>> searcher (lots of caching).
>> To get the searcher to see the index changes from the writer, you need the
>> searcher to do an empty commit - i.e. you invoke a commit with 0
>> documents.
>> This will refresh the caches (including autowarming), [re]build the
>> relevant searchers etc. and make any index changes visible to the RO
>> instance.
>> Also, make sure to use <lockType>native</lockType> in solrconfig.xml to
>> ensure the two instances don't try to commit at the same time.
>> There are several ways to trigger a commit:
>> Call commit() periodically within your own code.
>> Use autoCommit in solrconfig.xml.
>> Use an RPC/IPC mechanism between the 2 instance processes to tell the
>> searcher the index has changed, then call commit when called (more complex
>> coding, but good if the index changes on an ad-hoc basis).
>> Note, doing things this way isn't really suitable for an NRT environment.
>>
>> HTH,
>> Peter
>>
>>
>>
>> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com>
>> wrote:
>>
>> > Replication is fine, I am going to use it, but I wanted it for instances
>> > *distributed* across several (physical) machines - but here I have one
>> > physical machine, it has many cores. I want to run 2 instances of solr
>> > because I think it has these benefits:
>> >
>> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
>> > searcher (28GB)
>> > 2) I can deactivate warming for the writer and keep it for the searcher
>> > (this considerably speeds up indexing - each time we commit, the server
>> is
>> > rebuilding a citation network of 80M edges)
>> > 3) saving disk space and better OS caching (OS should be able to use
>> more
>> > RAM for the caching, which should result in faster operations - the two
>> > processes are accessing the same index)
>> >
>> > Maybe I should just forget it and go with the replication, but it
>> doesn't
>> > 'feel right' IFF it is on the same physical machine. And Lucene
>> > specifically has a method for discovering changes and re-opening the
>> index
>> > (DirectoryReader.openIfChanged)
>> >
>> > Am I not seeing something?
>> >
>> > roman
>> >
>> >
>> >
>> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
>> > jhell...@innoventsolutions.com> wrote:
>> >
>> > > Roman,
>> > >
>> > > Could you be more specific as to why replication doesn't meet your
>> > > requirements?  It was geared explicitly for this purpose, including
>> the
>> > > automatic discovery of changes to the data on the index master.
>> > >
>> > > Jason
>> > >
>> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com>
>> wrote:
>> > >
>> > > > OK, so I have verified the two instances can run alongside, sharing
>> the
>> > > > same datadir
>> > > >
>> > > > All update handlers are unaccessible in the read-only master
>> > > >
>> > > > <updateHandler class="solr.DirectUpdateHandler2"
>> > > >                 enable="${solr.can.write:true}">
>> > > >
>> > > > java -Dsolr.can.write=false .....
>> > > >
>> > > > And I can reload the index manually:
>> > > >
>> > > > curl "
>> > > >
>> > >
>> >
>> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
>> > > > "
>> > > >
>> > > > But this is not an ideal solution; I'd like for the read-only
>> server to
>> > > > discover index changes on its own. Any pointers?
>> > > >
>> > > > Thanks,
>> > > >
>> > > >  roman
>> > > >
>> > > >
>> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com>
>> > > wrote:
>> > > >
>> > > >> Hello,
>> > > >>
>> > > >> I need your expert advice. I am thinking about running two
>> instances
>> > of
>> > > >> solr that share the same datadirectory. The *reason* being:
>> indexing
>> > > >> instance is constantly building cache after every commit (we have a
>> > big
>> > > >> cache) and this slows it down. But indexing doesn't need much RAM,
>> > only
>> > > the
>> > > >> search does (and server has lots of CPUs)
>> > > >>
>> > > >> So, it is like having two solr instances
>> > > >>
>> > > >> 1. solr-indexing-master
>> > > >> 2. solr-read-only-master
>> > > >>
>> > > >> In the solrconfig.xml I can disable update components, It should be
>> > > fine.
>> > > >> However, I don't know how to 'trigger' index re-opening on (2)
>> after
>> > the
>> > > >> commit happens on (1).
>> > > >>
>> > > >> Ideally, the second instance could monitor the disk and re-open
>> disk
>> > > after
>> > > >> new files appear there. Do I have to implement custom
>> > > IndexReaderFactory?
>> > > >> Or something else?
>> > > >>
>> > > >> Please note: I know about the replication, this usecase is IMHO
>> > slightly
>> > > >> different - in fact, write-only-master (1) is also a replication
>> > master
>> > > >>
>> > > >> Googling turned out only this
>> > > >>
>> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
>> > > no
>> > > >> pointers there.
>> > > >>
>> > > >> But If I am approaching the problem wrongly, please don't hesitate
>> to
>> > > >> 're-educate' me :)
>> > > >>
>> > > >> Thanks!
>> > > >>
>> > > >>  roman
>> > > >>
>> > >
>> > >
>> >
>>
>
>

Re: Two instances of solr - the same datadir?

Reply via email to