Re: Update / replication of offline indexes

Upayavira Fri, 14 Dec 2012 01:34:17 -0800

I guess without knowing more about the usecase, it is difficult to see
whether it is best to ship pre-prepared indexes or indexable content.
Certainly the latter would be far simpler, and more in-keeping with the
way Solr is typically used, and personally I'd start with that.


Thinking through what you're saying - clients may update at any time -
i.e. they won't all be forced to accept every update on every occasion -
you will loose much ability to ship partial indexes. As segments get
merged over time, you'd need to ship partial indexes against all of the
possible states that might exist out there, and that would simply be
prohibitive.

Upayavira

On Fri, Dec 14, 2012, at 05:52 AM, Dikchant Sahi wrote:
> Yes, we have an uniqueId defined but merge adds two documents with the
> same
> id. As per my understanding this is how Solr behaves. Correct me if am
> wrong.
> 
> On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch
> <arafa...@gmail.com>wrote:
> 
> > Do you have IDs defined? How do you expect Sold to know they are duplicate
> > records? Maybe the issue is there somewhere.
> >
> > Regards,
> >      Alex
> > On 13 Dec 2012 15:17, "Dikchant Sahi" <contacts...@gmail.com> wrote:
> >
> > > Hi Alex,
> > >
> > > You got my point right. What I see is merge adds duplicate document. Is
> > > there a way to overwrite existing document in one core by another. Can
> > > merge operation lead to data corruption, say in case when the core on
> > > client had uncommitted changes.
> > >
> > > What would be a better solution for my requirement, merge or indexing
> > > XML/JSON?
> > >
> > > Regards,
> > > Dikchant
> > >
> > > On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
> > > <arafa...@gmail.com>wrote:
> > >
> > > > Not sure I fully understood this and maybe you already cover that by
> > > > 'merge', but if you know what you gave the client last time, you can
> > just
> > > > build a differential as a second core, then on client mount that second
> > > > core and merge it into the first one (e.g. with DIH).
> > > >
> > > > Just a thought.
> > > >
> > > > Regards,
> > > >    Alex.
> > > >
> > > > Personal blog: http://blog.outerthoughts.com/
> > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > - Time is the quality of nature that keeps events from happening all at
> > > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi <contacts...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Erick,
> > > > >
> > > > > Sorry for creating the confusion. By slave, I mean the indexes on
> > > client
> > > > > machine will be replica of the master and in not same as the slave in
> > > > > master-slave model. Below is the detail:
> > > > >
> > > > > The system is being developed to support search facility on 1000s of
> > > > > system, a majority of which will be offline.
> > > > >
> > > > > The idea is that we will have a search system which will be sold
> > > > > on subscription basis. For each of the subscriber, we will copy the
> > > > master
> > > > > index to their local machine, over a drive or CD. Now, if a
> > subscriber
> > > > > comes after 2 months and want the updates, we just want to provide
> > the
> > > > > deltas for 2 month as the volume of data is huge. For this we can
> > think
> > > > of
> > > > > two approaches:
> > > > > 1. Fetch the documents which are less than 2 months old  in JSON
> > format
> > > > > from master Solr. Copy it to the subscriber machine
> > > > > and index those documents. (copy through cd / memory sticks)
> > > > > 2. Create separate indexes for each month on our master machine. Copy
> > > the
> > > > > indexes to the client machine and merge. Prior to merge we need to
> > > delete
> > > > > records which the new index has, to avoid duplicates.
> > > > >
> > > > > As long as the setup is new, we will copy the complete index and
> > > restart
> > > > > Solr. We are not sure of the best approach for copying the deltas.
> > > > >
> > > > > Thanks,
> > > > > Dikchant
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson <
> > > erickerick...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > This is somewhat confusing. You say that box2 is the slave, yet
> > > they're
> > > > > not
> > > > > > connected? Then you need to copy the <solr home>/data index from
> > box
> > > 1
> > > > to
> > > > > > box 2 manually (I'd have box2 solr shut down at the time) and
> > restart
> > > > > Solr.
> > > > > >
> > > > > > Why can't the boxes be connected? That's a much simpler way of
> > going
> > > > > about
> > > > > > it.
> > > > > >
> > > > > > Best
> > > > > > Erick
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi <
> > > contacts...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Walter,
> > > > > > >
> > > > > > > Thanks for the response.
> > > > > > >
> > > > > > > Commit will help to reflect changes on Box1. We are able to
> > achieve
> > > > > this.
> > > > > > > We want the changes to reflect in Box2.
> > > > > > >
> > > > > > > We have two indexes. Say
> > > > > > > Box1: Master & DB has been setup. Data Import runs on this.
> > > > > > > Box2: Slave running.
> > > > > > >
> > > > > > > We want all the updates on Box1 to be merged/present in index on
> > > > Box2.
> > > > > > Both
> > > > > > > the boxes are not connected over n/w. How can be achieve this.
> > > > > > >
> > > > > > > Please let me know, if am not clear.
> > > > > > >
> > > > > > > Thanks again!
> > > > > > >
> > > > > > > Regards,
> > > > > > > Dikchant
> > > > > > >
> > > > > > > On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood <
> > > > > > wun...@wunderwood.org
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > You do not need to manage online and offline indexes. Commit
> > when
> > > > you
> > > > > > are
> > > > > > > > done with your updates and Solr will take care of it for you.
> > The
> > > > > > changes
> > > > > > > > are not live until you commit.
> > > > > > > >
> > > > > > > > wunder
> > > > > > > >
> > > > > > > > On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > How can we do delta update of offline indexes?
> > > > > > > > >
> > > > > > > > > We have the master index on which data import will be done.
> > The
> > > > > index
> > > > > > > > > directory will be copied to slave machine in case of full
> > > update,
> > > > > > > through
> > > > > > > > > CD as the  slave/client machine is offline.
> > > > > > > > > So, what should be the approach for getting the delta to the
> > > > > slave. I
> > > > > > > can
> > > > > > > > > think of two approaches.
> > > > > > > > >
> > > > > > > > > 1.Create separate indexes of the delta on the master machine,
> > > > copy
> > > > > it
> > > > > > > to
> > > > > > > > > the slave machine and merge. Before merging the indexes on
> > the
> > > > > client
> > > > > > > > > machine, delete all the updated and deleted documents in
> > client
> > > > > > machine
> > > > > > > > > else merge will add duplicates. So along with the index, we
> > > need
> > > > to
> > > > > > > > > transfer the list of documents which has been
> > updated/deleted.
> > > > > > > > >
> > > > > > > > > 2. Extract all the documents which has changed since a
> > > particular
> > > > > > time
> > > > > > > in
> > > > > > > > > XML/JSON and index it in client machine.
> > > > > > > > >
> > > > > > > > > The size of indexes are huge, so we cannot rollover index
> > > > > everytime.
> > > > > > > > >
> > > > > > > > > Please help me with your take and challenges you see in the
> > > above
> > > > > > > > > approaches. Please suggest if you think of any other better
> > > > > approach.
> > > > > > > > >
> > > > > > > > > Thanks a ton!
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Dikchant
> > > > > > > >
> > > > > > > > --
> > > > > > > > Walter Underwood
> > > > > > > > wun...@wunderwood.org
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: Update / replication of offline indexes

Reply via email to