[
https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683532#action_12683532
]
Fergus McMenemie commented on SOLR-1060:
----------------------------------------
I have applied the latest version of SOLR-1059 and I just *cannot* get delete
to work!
{code}
<entity name="single-delete"
dataSource="myURIreader"
processor="XPathEntityProcessor"
url="${dataimporter.request.single}"
rootEntity="true"
flatten="true"
stream="false"
forEach="/record | /record/mediaBlock"
transformer="TemplateTransformer">
<field column="$skipDoc" template="true" />
<field column="fileAbsolutePath"
template="${dataimporter.request.single}" />
<field column="$deleteDocByQuery"
template="fileAbsolutePath:${dataimporter.request.single}" />
<field column="vdkvgwkey"
template="${dataimporter.request.single}" />
</entity>
{code}
And here is a section from the log file showing that after an attempt to wipe
the file, it is still there; it was not removed.
{code}
Mar 19, 2009 5:24:52 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/apache-solr-1.4-dev path=/select
params={wt=xml&q=fileAbsolutePath:file\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml}
hits=3 status=0 QTime=10
Mar 19, 2009 5:25:04 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/apache-solr-1.4-dev path=/dataimport
params={command=full-import&clean=false&entity=single-delete&commit=true&single=file\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml}
status=0 QTime=0
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.URLDataSource getData
SEVERE: Exception thrown while getting data
java.net.MalformedURLException: no protocol:
nullfile\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:88)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:47)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:239)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:182)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:165)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:163)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:309)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:367)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:348)
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: single-delete document :
SolrInputDocument[{}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in
invoking url null Processing Document # 1
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:112)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:47)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:239)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:182)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:165)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:163)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:309)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:367)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:348)
Caused by: java.net.MalformedURLException: no protocol:
nullfile\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:88)
... 10 more
Mar 19, 2009 5:25:04 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in
invoking url null Processing Document # 1
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:112)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:47)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:239)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:182)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:165)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:163)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:309)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:367)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:348)
Caused by: java.net.MalformedURLException: no protocol:
nullfile\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:88)
... 10 more
Mar 19, 2009 5:25:04 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Mar 19, 2009 5:25:04 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Mar 19, 2009 5:25:04 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening searc...@281e7e main
Mar 19, 2009 5:25:04 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@281e7e main from searc...@7740f6 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@281e7e main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@281e7e main from searc...@7740f6 main
filterCache{lookups=6,hits=6,hitratio=1.00,inserts=0,evictions=0,size=9,warmupTime=16,cumulative_lookups=25,cumulative_hits=25,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@281e7e main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=9,warmupTime=16,cumulative_lookups=25,cumulative_hits=25,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@281e7e main from searc...@7740f6 main
queryResultCache{lookups=2,hits=2,hitratio=1.00,inserts=7,evictions=0,size=7,warmupTime=8,cumulative_lookups=9,cumulative_hits=7,cumulative_hitratio=0.77,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@281e7e main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=7,evictions=0,size=7,warmupTime=8,cumulative_lookups=9,cumulative_hits=7,cumulative_hitratio=0.77,cumulative_inserts=2,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@281e7e main from searc...@7740f6 main
documentCache{lookups=18,hits=15,hitratio=0.83,inserts=26,evictions=0,size=26,warmupTime=0,cumulative_lookups=165,cumulative_hits=149,cumulative_hitratio=0.90,cumulative_inserts=16,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@281e7e main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=165,cumulative_hits=149,cumulative_hitratio=0.90,cumulative_inserts=16,cumulative_evictions=0}
Mar 19, 2009 5:25:04 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@281e7e main
Mar 19, 2009 5:25:04 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={rows=10&start=0&q=solr} hits=0 status=0
QTime=6
Mar 19, 2009 5:25:04 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={rows=10&start=0&q=rocks} hits=90
status=0 QTime=34
Mar 19, 2009 5:25:04 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null
params={q=static+newSearcher+warming+query+from+solrconfig.xml} hits=12327
status=0 QTime=98
Mar 19, 2009 5:25:04 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 19, 2009 5:25:04 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@281e7e main
Mar 19, 2009 5:25:04 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@7740f6 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=6,hits=6,hitratio=1.00,inserts=0,evictions=0,size=9,warmupTime=16,cumulative_lookups=25,cumulative_hits=25,cumulative_hitratio=1.00,cumulative_inserts=2,cumulative_evictions=0}
queryResultCache{lookups=2,hits=2,hitratio=1.00,inserts=7,evictions=0,size=7,warmupTime=8,cumulative_lookups=9,cumulative_hits=7,cumulative_hitratio=0.77,cumulative_inserts=2,cumulative_evictions=0}
documentCache{lookups=18,hits=15,hitratio=0.83,inserts=26,evictions=0,size=26,warmupTime=0,cumulative_lookups=165,cumulative_hits=149,cumulative_hitratio=0.90,cumulative_inserts=16,cumulative_evictions=0}
Mar 19, 2009 5:25:12 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/apache-solr-1.4-dev path=/select
params={wt=xml&q=fileAbsolutePath:file\:///Volumes/spare/ts/janes/schema/janesxml/data/news/jdw/jdw2008/jni71796.xml}
hits=3 status=0 QTime=11
{code}
Any hints on what I should try next?
> a new DIH EnityProcessor allowing text file lists of files to be indexed
> ------------------------------------------------------------------------
>
> Key: SOLR-1060
> URL: https://issues.apache.org/jira/browse/SOLR-1060
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Fergus McMenemie
> Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-1060.patch, SOLR-1060.patch
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> I have finished a new DIH EntityProcessor. It is designed around the idea
> that whatever demon is used to maintain your content store it is likely to
> drop a report or log file explaining what has changed within your content
> store. I wish to use this report file to control the indexing of the new or
> changed content and the removal of old content. The report files, perhaps
> from un-tar or un-zip, are likely to reference jpegs and directory stubs
> which need to be ignored. I assumed a file based content repository but this
> should be expanded to handle URI's as well
> I feel that the current FileListEntityProcessor is poorly named. It should be
> called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And
> this new EntityProcessor should have the name FileListEntityProcessor.
> However what is done is done. I then came up with manifestEnityProcessor
> which I thought suited, manifest files are all over the content sets I deal
> with and the dictionary definition seemed close enough ("ships manifest").
> However how about ChangeListEntityProcessor
> {code}
> <entity name="jc"
> processor="ManifestEntityProcessor"
> baseDir="/Volumes/Techmore/ts/aaa/schema/data"
> rootEntity="false"
> dataSource="null"
> allowRegex="^.*\.xml$"
> blockRegex="usc2009"
> manifestFileName="/Volumes/ts/man-find.txt"
> docAddRegex=".*"
> >
> {code}
> The new entity fields are as follows.
>
> *manifestFileName* is the required location of the manifest file. If this
> value is relative, it assumed to be relative to baseDir.
> *allowRegex* is an optional attribute that if present discards any line
> which does not match the regExp
>
> *blockRegex* is an optional attribute that is applied after any allowRegex
> and discards any line which matches the regExp
> *docAddRegex* is a required regex to identify lines which when matched
> should cause docs to be added to the index. As well as matching the line it
> should also return the portion of the line which contains the filepath as
> group(1)
> *docDeleteRegex* is an optional value of a regex to identify documents
> which when matched should be deleted from the index. As well as matching the
> line it should also return the portion of the line which contains the
> filepath as group(1) **PLANNED**
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.