[ 
https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683577#action_12683577
 ] 

Fergus McMenemie commented on SOLR-1060:
----------------------------------------

Correct documents are not getting deleted. Line 2 from the log shows:-

   path=/dataimport 
command=full-import&clean=false&entity=single-delete&commit=true&single=file:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml

Line 12 is me dong a query for the same document:-

   path=/select 
params={wt=xml&q=fileAbsolutePath:file\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml}
 hits=3 status=0 QTime=11

which returns three hits. So the documents have not been deleted! Removing the 
$skipDoc=true and rerunning the delete I get:-

{code}
Mar 19, 2009 7:33:34 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/apache-solr-1.4-dev path=/dataimport 
params={command=full-import&clean=false&entity=single-delete&commit=true&single=file:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml}
 status=0 QTime=0 
Mar 19, 2009 7:33:34 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Mar 19, 2009 7:33:34 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Mar 19, 2009 7:33:34 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Mar 19, 2009 7:33:34 PM org.apache.solr.handler.dataimport.SolrWriter 
deleteByQuery
INFO: Deleting documents from Solr with query: 
fileAbsolutePath:file:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml
Mar 19, 2009 7:33:34 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.queryParser.ParseException: Cannot parse 
'fileAbsolutePath:file:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml':
 Encountered " ":" ": "" at line 1, column 21.
Was expecting one of:
    <EOF> 
    <AND> ...
    <OR> ...
    <NOT> ...
    "+" ...
    "-" ...
    "(" ...
    "*" ...
    "^" ...
    <QUOTED> ...
    <TERM> ...
    <FUZZY_SLOP> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:177)
        at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:74)
        at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:63)
        at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:314)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:70)
        at 
org.apache.solr.handler.dataimport.SolrWriter.deleteByQuery(SolrWriter.java:153)
        at 
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:449)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:358)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:163)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:309)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:367)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:348)

Mar 19, 2009 7:33:34 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 
org.apache.solr.common.SolrException: Error parsing Lucene query
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:163)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:309)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:367)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:348)
Caused by: org.apache.solr.common.SolrException: Error parsing Lucene query
        at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:84)
        at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:63)
        at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:314)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:70)
        at 
org.apache.solr.handler.dataimport.SolrWriter.deleteByQuery(SolrWriter.java:153)
        at 
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:449)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:358)
        ... 5 more
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 
'fileAbsolutePath:file:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml':
 Encountered " ":" ": "" at line 1, column 21.
Was expecting one of:
    <EOF> 
    <AND> ...
    <OR> ...
    <NOT> ...
    "+" ...
    "-" ...
    "(" ...
    "*" ...
    "^" ...
    <QUOTED> ...
    <TERM> ...
    <FUZZY_SLOP> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:177)
        at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:74)
        ... 11 more
Mar 19, 2009 7:33:34 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Mar 19, 2009 7:33:34 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Mar 19, 2009 7:33:34 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Mar 19, 2009 7:33:34 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening searc...@86b804 main
Mar 19, 2009 7:33:34 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Mar 19, 2009 7:33:34 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@86b804 main from searc...@281e7e main
        
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 19, 2009 7:33:34 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@86b804 main
        
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 19, 2009 7:33:34 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@86b804 main from searc...@281e7e main
        
filterCache{lookups=6,hits=6,hitratio=1.00,inserts=0,evictions=0,size=9,warmupTime=16,cumulative_lookups=31,cumulative_hits=31,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@86b804 main
        
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=9,warmupTime=39,cumulative_lookups=31,cumulative_hits=31,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@86b804 main from searc...@281e7e main
        
queryResultCache{lookups=2,hits=2,hitratio=1.00,inserts=7,evictions=0,size=7,warmupTime=8,cumulative_lookups=11,cumulative_hits=9,cumulative_hitratio=0.81,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@86b804 main
        
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=7,evictions=0,size=7,warmupTime=9,cumulative_lookups=11,cumulative_hits=9,cumulative_hitratio=0.81,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@86b804 main from searc...@281e7e main
        
documentCache{lookups=18,hits=15,hitratio=0.83,inserts=26,evictions=0,size=26,warmupTime=0,cumulative_lookups=183,cumulative_hits=164,cumulative_hitratio=0.89,cumulative_inserts=19,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@86b804 main
        
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=183,cumulative_hits=164,cumulative_hitratio=0.89,cumulative_inserts=19,cumulative_evictions=0}
Mar 19, 2009 7:33:35 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@86b804 main
Mar 19, 2009 7:33:35 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={rows=10&start=0&q=solr} hits=0 status=0 
QTime=3 
Mar 19, 2009 7:33:35 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={rows=10&start=0&q=rocks} hits=90 
status=0 QTime=16 
Mar 19, 2009 7:33:35 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null 
params={q=static+newSearcher+warming+query+from+solrconfig.xml} hits=12327 
status=0 QTime=96 
Mar 19, 2009 7:33:35 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 19, 2009 7:33:35 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@86b804 main
Mar 19, 2009 7:33:35 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@281e7e main
        
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
        
filterCache{lookups=6,hits=6,hitratio=1.00,inserts=0,evictions=0,size=9,warmupTime=16,cumulative_lookups=31,cumulative_hits=31,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
        
queryResultCache{lookups=2,hits=2,hitratio=1.00,inserts=7,evictions=0,size=7,warmupTime=8,cumulative_lookups=11,cumulative_hits=9,cumulative_hitratio=0.81,cumulative_inserts=2,cumulative_evictions=0}
        
documentCache{lookups=18,hits=15,hitratio=0.83,inserts=26,evictions=0,size=26,warmupTime=0,cumulative_lookups=183,cumulative_hits=164,cumulative_hitratio=0.89,cumulative_inserts=19,cumulative_evictions=0}

{code}



> a new DIH EnityProcessor allowing text file lists of files to be indexed
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1060
>                 URL: https://issues.apache.org/jira/browse/SOLR-1060
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Fergus McMenemie
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-1060.patch, SOLR-1060.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have finished a new DIH EntityProcessor. It is designed around the idea 
> that whatever demon is used to maintain your content store it is likely to 
> drop a report or log file explaining what has changed within your content 
> store. I wish to use this report file to control the indexing of the new or 
> changed content and the removal of old content. The report files, perhaps 
> from un-tar or un-zip, are likely to reference jpegs and directory stubs 
> which need to be ignored. I assumed a file based content repository but this 
> should be expanded to handle URI's as well
> I feel that the current FileListEntityProcessor is poorly named. It should be 
> called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And 
> this new EntityProcessor should have the name FileListEntityProcessor. 
> However what is done is done. I then came up with manifestEnityProcessor 
> which I thought suited, manifest files are all over the content sets I deal 
> with and the dictionary definition seemed close enough ("ships manifest"). 
> However how about ChangeListEntityProcessor
> {code}
>        <entity name="jc"
>                processor="ManifestEntityProcessor"
>                baseDir="/Volumes/Techmore/ts/aaa/schema/data"
>                rootEntity="false"
>                dataSource="null"
>                allowRegex="^.*\.xml$"
>                blockRegex="usc2009"
>                manifestFileName="/Volumes/ts/man-find.txt"
>                docAddRegex=".*"
>                >
> {code}
> The new entity fields are as follows.
>  
>    *manifestFileName* is the required location of the manifest file. If this 
> value is relative, it assumed to be relative to baseDir.
>    *allowRegex* is an optional attribute that if present discards any line 
> which does not match the regExp
>  
>    *blockRegex* is an optional attribute that is applied after any allowRegex 
> and discards any line which matches the regExp
>    *docAddRegex* is a required regex to identify lines which when matched 
> should cause docs to be added to the index. As well as matching the line it 
> should also return the portion of the line which contains the filepath as 
> group(1)
>    *docDeleteRegex* is an optional value of a regex to identify documents 
> which when matched should be deleted from the index. As well as matching the 
> line it should also return the portion of the line which contains the 
> filepath as group(1) **PLANNED**

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to