On Fri, Oct 24, 2008 at 6:07 PM, <[EMAIL PROTECTED]> wrote:

> Thanks for your very fast response :-)
>
>
> > > 2.)
> > > The documentation from DataImportHandler describes the index update
> > process for SQL databases only...
> > >
> > > My scenario:
> > > - My application creates, deletes and modifies files from /tmp/files
> > every night.
> > > - delta-import / DataImportHandler should "mirror" _all_ this changes
> to
> > my lucene index (=> create, delete, update documents).
> > The only Entityprocessor which supports delta is SqlEntityProcessor.
> > The XPathEntityProcessor has not implemented it , because we do not
> > know of a consistent way of finding deltas for XML. So ,
> > unfortunately,no delta support for XML. But that said you can
> > implement those methods in XPathEntityProcessor . The methods are
> > explained in EntityProcessor.java. if you have questions specific to
> > this I can help.Probably we can contribute it back
> > >
> > > ===> Is this possible with delta-import / DataImportHandler?
> > > ===> If not: Do you have any suggestions on how to do this?
>
> Ok so, at the moment I have to do a full-import to update my index. What
> happens with (user) queries while full-import is running? Does Solr block
> this queries the import is finished? Which configuration options control
> this behavior?


No queries to SOLR  are not blocked during full import.


>
>
>
> > > My scenario:
> > > - /tmp/files contains 682 'myDoc_.*\.xml' XML files.
> > > - Each XML file contains 12 XML elements (e.g. <title>foo</title>).
> > > - DataImportHandler transfer only 5 from this 12 elements to the lucene
> > index.
> > >
> > >
> > > I don't understand the output from 'solr/dataimport' (=> status):
> > >
> > > ###
> > > <response>
> > >  ...
> > >  <lst name="statusMessages">
> > >  <str name="Total Requests made to DataSource">0</str>
> > >  <str name="Total Rows Fetched">1363</str>
> > >  <str name="Total Documents Skipped">0</str>
> > >  <str name="Full Dump Started">2008-10-24 13:19:03</str>
> > >  <str name="">
> > >    Indexing completed. Added/Updated: 681 documents. Deleted 0
> > documents.
> > >  </str>
> > >  <str name="Committed">2008-10-24 13:19:05</str>
> > >  <str name="Optimized">2008-10-24 13:19:05</str>
> > >  <str name="Time taken ">0:0:2.648</str>
> > >  </lst>
> > > ...
> > > </response>
> > >
> > > ===> Why shows the "Added/Updated" counter 681 and not 682?
> >
> > Added updated is the no:of docs . How do you know the number is not
> > accurate?
>
>
> /tmp/files$ ls myDoc_*.xml | wc -l
> 682
>
> But "Added/Updated" shows 681. Does this mean that one file has an XML
> error? But the statistic says "Total Documents Skipped" = 0?!


It might be the case that somewhere there is a extra line in one of the XML
files, a line like <?xml version="1.0" encoding="utf-8"?> or something.


>
>
>
>
> > > 4.)
> > > And my last questions about Solr statistics/informations...
> > >
> > > ===> Is it possible to get informations (number of indexed documents,
> > stored values from documents etc.) from the current lucene index?
> > > ===> The admin webinterface shows 'numDocs' and 'maxDoc' in
> > 'statistics/core'. Is 'numDocs' = number of indexed documents? What means
> 'maxDocs'?
>
> Do you have answers for this questions too?
>
> Bye,
> Simon
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>



-- 
Regards,
Akshay Ukey.

Reply via email to