Re: Newbie Design Questions

Gunaranjan Chandraraju Thu, 22 Jan 2009 14:28:12 -0800

Thanks

A last question - do you have any approximate date for the release of1.4. If its going to be soon enough (within a month or so) then I canplan for our development around it.


Thanks
Guna

On Jan 22, 2009, at 11:04 AM, Noble Paul നോബിള്‍नोब्ळ् wrote:

You are out of luck if you are not using a recent version of DIH

The sub entity will work only if you use the FieldReaderDataSource.
Then you do not need a ClobTransformer also.

The trunk version of DIH can be used w/ Solr 1.3 release

On Thu, Jan 22, 2009 at 12:59 PM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
Hi
Yes, the XML is inside the DB in a clob. Would love to useXPath insideSQLEntityProcessor as it will save me tons of trouble for file-dumping(given that I am not able to post it). This is how I setup my DIHfor DB
import.

<dataConfig>
<dataSource type="JdbcDataSource" name="data-source-1"
driver="oracle.jdbc.driver.OracleDriver"url="jdbc:oracle:thin:@XXXXX"
user="abc" password="***" batchSize="100"/>
 <document>
   <entity dataSource="data-source-1"
               name ="item" processor="SqlEntityProcessor"
           pk="ID"
           stream="false"
           rootEntity="false"
transformer="ClobTransformer" 
query="select xml_col from xml_table where xml_col ISNOT NULL"
 
      <entity
dataSource="null" 
         name="record"
         processor="XPathEntityProcessor"
         stream="false"
         url="${item.xml_col}"
          forEach="/record">

            <field column="ID" xpath="/record/coreinfo/@a" />
            <field column="type" xpath="/record/coreinfo/@b" />
            <field column="streetname" xpath="/record/address/@c" />

    .. and so on
      </entity>


   </entity>
 </document>
</dataConfig>
The problem with this is that it always fails with this error. Ican seethat the earlier SQL entity extraction and clob transformation isworking as
the values show in the debug jsp (verbose mode with dataimport.jsp).
However no records are extracted for entity. When I checkcatalina.outfile, it shows me the following errors for entity name="record".(the XPath
entity on top).

java.lang.NullPointerException at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85).
I don't have the whole stack trace right now. If you need it Iwould be
happy to recreate and post it.

Regards,
Guna
On Jan 21, 2009, at 8:22 PM, Noble Paul നോബിള്‍नोब्ळ् wrote:
On Thu, Jan 22, 2009 at 7:02 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
Thanks

Yes the source of data is a DB.  However the xml is also posted on
updates
via publish framework. So I can just plug in an adapter hear tolisten
for
changes and post to SOLR. I was trying to use the XPathProcessorinside
the
SQLEntityProcessor and this did not work (using 1.3 - I did seesupport
in
1.4). That is not a show stopper for me and I can just post themvia the
framework and use files for the first time load.
XPathEntityprocessor works inside SqlEntityprocessor only if a db
field contains xml.
However ,you can have a separate entity (at the root) to read fromdb
for delta.
Anyway if your current solution works stick to it.
Have a seen a couple of answers on the backup for crashscenarios. justwanted to confirm - if I replace the index with the backup'edfiles then
I
can simple start the up solr again and reindex the documentschanged
since
last backup? Am I right? The slaves will also automaticallyadjust to
this.
Yes. you can replace an archived index and Solr should work justfine.
but the docs added since the last snapshot was taken will be missing
(of course :) )
THanks
Guna
On Jan 20, 2009, at 9:37 PM, Noble Paul നോബിള്‍नोब्ळ् wrote:
On Wed, Jan 21, 2009 at 5:15 AM, Gunaranjan Chandraraju
<chandrar...@apple.com> wrote:
Hi All
We are considering SOLR for a large database of XMLs. I havesome
newbie
questions - if there is a place I can go read about them do letme know
and
I will go read up :)
1. Currently we are able to pull the XMLs from a file systemsusingFileDataSource. The DIH is convenient since I can map my XMLfields
using
the XPathProcessor. This works for an initial load. Howeverafter
the
initial load, we would like to 'post' changed xmls to SOLRwhenever the
XML
is updated in a separate system. I know we can post xmls with'add'
however
I was not sure how to do this and maintain the DIH mapping Iuse indata-config.xml? I don't want to save the file to the disk andthen
call
the DIH - would prefer to directly post it. Do I need to usesolrj for
this?
What is the source of your new data? is it a DB?
2. If my solr schema.xml changes then do I HAVE to reindex allthe olddocuments? Suppose in future we have newer XML documents thatcontain
a
new
additional xml field. The old documents that are alreadyindexed
don't
have this field and (so) I don't need search on them with thisfield.However the new ones need to be search-able on this newfield. Can Ijust add this new field to the SOLR schema, restart the serversjust
post
the new new documents or do I need to reindex everything?
3. Can I backup the index directory. So that in case of a diskcrash -
I
can restore this directory and bring solr up. I realize that any
documents
indexed after this backup would be lost - I can however keeptrack of
these
outside and simply re-index documents 'newer' than that backupdate.
This
question is really important to me in the context of using aMaster
Server
with replicated index.  I would like to run this backup for the
'Master'.
the snapshot script is can be used to take backups on commit.
4. In general what happens when the solr application isbounced? Is
the
index affected (anything maintained in memory)?

Regards
Guna
--
--Noble Paul
--
--Noble Paul
--
--Noble Paul

Re: Newbie Design Questions

Reply via email to