On Fri, Oct 24, 2008 at 6:07 PM, <[EMAIL PROTECTED]> wrote: > Thanks for your very fast response :-) > > > > > 2.) > > > The documentation from DataImportHandler describes the index update > > process for SQL databases only... > > > > > > My scenario: > > > - My application creates, deletes and modifies files from /tmp/files > > every night. > > > - delta-import / DataImportHandler should "mirror" _all_ this changes > to > > my lucene index (=> create, delete, update documents). > > The only Entityprocessor which supports delta is SqlEntityProcessor. > > The XPathEntityProcessor has not implemented it , because we do not > > know of a consistent way of finding deltas for XML. So , > > unfortunately,no delta support for XML. But that said you can > > implement those methods in XPathEntityProcessor . The methods are > > explained in EntityProcessor.java. if you have questions specific to > > this I can help.Probably we can contribute it back > > > > > > ===> Is this possible with delta-import / DataImportHandler? > > > ===> If not: Do you have any suggestions on how to do this? > > Ok so, at the moment I have to do a full-import to update my index. What > happens with (user) queries while full-import is running? Does Solr block > this queries the import is finished? Which configuration options control > this behavior?
No queries to SOLR are not blocked during full import. > > > > > > My scenario: > > > - /tmp/files contains 682 'myDoc_.*\.xml' XML files. > > > - Each XML file contains 12 XML elements (e.g. <title>foo</title>). > > > - DataImportHandler transfer only 5 from this 12 elements to the lucene > > index. > > > > > > > > > I don't understand the output from 'solr/dataimport' (=> status): > > > > > > ### > > > <response> > > > ... > > > <lst name="statusMessages"> > > > <str name="Total Requests made to DataSource">0</str> > > > <str name="Total Rows Fetched">1363</str> > > > <str name="Total Documents Skipped">0</str> > > > <str name="Full Dump Started">2008-10-24 13:19:03</str> > > > <str name=""> > > > Indexing completed. Added/Updated: 681 documents. Deleted 0 > > documents. > > > </str> > > > <str name="Committed">2008-10-24 13:19:05</str> > > > <str name="Optimized">2008-10-24 13:19:05</str> > > > <str name="Time taken ">0:0:2.648</str> > > > </lst> > > > ... > > > </response> > > > > > > ===> Why shows the "Added/Updated" counter 681 and not 682? > > > > Added updated is the no:of docs . How do you know the number is not > > accurate? > > > /tmp/files$ ls myDoc_*.xml | wc -l > 682 > > But "Added/Updated" shows 681. Does this mean that one file has an XML > error? But the statistic says "Total Documents Skipped" = 0?! It might be the case that somewhere there is a extra line in one of the XML files, a line like <?xml version="1.0" encoding="utf-8"?> or something. > > > > > > > 4.) > > > And my last questions about Solr statistics/informations... > > > > > > ===> Is it possible to get informations (number of indexed documents, > > stored values from documents etc.) from the current lucene index? > > > ===> The admin webinterface shows 'numDocs' and 'maxDoc' in > > 'statistics/core'. Is 'numDocs' = number of indexed documents? What means > 'maxDocs'? > > Do you have answers for this questions too? > > Bye, > Simon > -- > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! > Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer > -- Regards, Akshay Ukey.