Re: Newbie Design Questions

Noble Paul നോബിള്‍ नोब्ळ् Wed, 21 Jan 2009 20:23:35 -0800

On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
> Thanks
>
> Yes the source of data is a DB.  However the xml is also posted on updates
> via publish framework.  So I can just plug in an adapter hear to listen for
> changes and post to SOLR.  I was trying to use the XPathProcessor inside the
> SQLEntityProcessor and this did not work (using 1.3 - I did see support in
> 1.4).  That is not a show stopper for me and I can just post them via the
> framework and use files for the first time load.
XPathEntityprocessor works inside SqlEntityprocessor only if a db
field contains xml.


However ,you can have a separate entity (at the root) to read from db
for delta.
Anyway if your current solution works stick to it.
>
> Have a seen a couple of answers on the backup for crash scenarios.  just
> wanted to confirm - if I replace the index with the backup'ed files then I
> can simple start the up solr again and reindex the documents changed since
> last backup? Am I right?  The slaves will also automatically adjust to this.
Yes. you can replace an archived index and Solr should work just fine.
but the docs added since the last snapshot was taken will be missing
(of course :) )
>
> THanks
> Guna
>
>
> On Jan 20, 2009, at 9:37 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> On Wed, Jan 21, 2009 at 5:15 AM, Gunaranjan Chandraraju
>> <chandrar...@apple.com> wrote:
>>>
>>> Hi All
>>> We are considering SOLR for a large database of XMLs.  I have some newbie
>>> questions - if there is a place I can go read about them do let me know
>>> and
>>> I will go read up :)
>>>
>>> 1. Currently we are able to pull the XMLs from a file systems using
>>> FileDataSource.  The DIH is convenient since I can map my XML fields
>>> using
>>> the XPathProcessor. This works for an initial load.    However after the
>>> initial load, we would like to 'post' changed xmls to SOLR whenever the
>>> XML
>>> is updated in a separate system.  I know we can post xmls with 'add'
>>> however
>>> I was not sure how to do this and maintain the DIH mapping I use in
>>> data-config.xml?  I don't want to save the file to the disk and then call
>>> the DIH - would prefer to directly post it.  Do I need to use solrj for
>>> this?
>>
>> What is the source of your new data? is it a DB?
>>
>>>
>>> 2.  If my solr schema.xml changes then do I HAVE to reindex all the old
>>> documents?  Suppose in future we have newer XML documents that contain a
>>> new
>>> additional xml field.    The old documents that are already indexed don't
>>> have this field and (so) I don't need search on them with this field.
>>> However the new ones need to be search-able on this new field.    Can I
>>> just add this new field to the SOLR schema, restart the servers just post
>>> the new new documents or do I need to reindex everything?
>>>
>>> 3. Can I backup the index directory.  So that in case of a disk crash - I
>>> can restore this directory and bring solr up. I realize that any
>>> documents
>>> indexed after this backup would be lost - I can however keep track of
>>> these
>>> outside and simply re-index documents 'newer' than that backup date.
>>>  This
>>> question is really important to me in the context of using a Master
>>> Server
>>> with replicated index.  I would like to run this backup for the 'Master'.
>>
>> the snapshot script is can be used to take backups on commit.
>>>
>>> 4.  In general what happens when the solr application is bounced?  Is the
>>> index affected (anything maintained in memory)?
>>>
>>> Regards
>>> Guna
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul

Re: Newbie Design Questions

Reply via email to