Replication is fine, I am going to use it, but I wanted it for instances
*distributed* across several (physical) machines - but here I have one
physical machine, it has many cores. I want to run 2 instances of solr
because I think it has these benefits:

1) I can give less RAM to the writer (4GB), and use more RAM for the
searcher (28GB)
2) I can deactivate warming for the writer and keep it for the searcher
(this considerably speeds up indexing - each time we commit, the server is
rebuilding a citation network of 80M edges)
3) saving disk space and better OS caching (OS should be able to use more
RAM for the caching, which should result in faster operations - the two
processes are accessing the same index)

Maybe I should just forget it and go with the replication, but it doesn't
'feel right' IFF it is on the same physical machine. And Lucene
specifically has a method for discovering changes and re-opening the index
(DirectoryReader.openIfChanged)

Am I not seeing something?

roman



On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Roman,
>
> Could you be more specific as to why replication doesn't meet your
> requirements?  It was geared explicitly for this purpose, including the
> automatic discovery of changes to the data on the index master.
>
> Jason
>
> On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
>
> > OK, so I have verified the two instances can run alongside, sharing the
> > same datadir
> >
> > All update handlers are unaccessible in the read-only master
> >
> > <updateHandler class="solr.DirectUpdateHandler2"
> >                 enable="${solr.can.write:true}">
> >
> > java -Dsolr.can.write=false .....
> >
> > And I can reload the index manually:
> >
> > curl "
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > "
> >
> > But this is not an ideal solution; I'd like for the read-only server to
> > discover index changes on its own. Any pointers?
> >
> > Thanks,
> >
> >  roman
> >
> >
> > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> I need your expert advice. I am thinking about running two instances of
> >> solr that share the same datadirectory. The *reason* being: indexing
> >> instance is constantly building cache after every commit (we have a big
> >> cache) and this slows it down. But indexing doesn't need much RAM, only
> the
> >> search does (and server has lots of CPUs)
> >>
> >> So, it is like having two solr instances
> >>
> >> 1. solr-indexing-master
> >> 2. solr-read-only-master
> >>
> >> In the solrconfig.xml I can disable update components, It should be
> fine.
> >> However, I don't know how to 'trigger' index re-opening on (2) after the
> >> commit happens on (1).
> >>
> >> Ideally, the second instance could monitor the disk and re-open disk
> after
> >> new files appear there. Do I have to implement custom
> IndexReaderFactory?
> >> Or something else?
> >>
> >> Please note: I know about the replication, this usecase is IMHO slightly
> >> different - in fact, write-only-master (1) is also a replication master
> >>
> >> Googling turned out only this
> >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
> no
> >> pointers there.
> >>
> >> But If I am approaching the problem wrongly, please don't hesitate to
> >> 're-educate' me :)
> >>
> >> Thanks!
> >>
> >>  roman
> >>
>
>

Reply via email to