I'm trying to write a DIH to incorporate page view metrics from an XML feed
into our index. The DIH makes a single request, and updates 0 documents.
I set log level to "finest" for the entire dataimport section, but I still
can't tell what's wrong. I suspect the XPath.
http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport
returns 404. Any suggestions on how I can debug this?
*
solr-spec
4.0.0.2012.08.06.22.50.47
The XML data:
<?xml version='1.0' encoding='UTF-8'?>
<ReportDataResponse>
<Data>
<Rows>
<Row rowKey="P#PRODUCT: BURLAP POTATO SACKS (PACK OF 12)
(W4537)#N/A#550000000016196614" rowActionAvailability="0 0 0">
<Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: BURLAP POTATO
SACKS (PACK OF 12) (W4537)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">2388</Value>
</Row>
<Row rowKey="P#PRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850)
(BE9000)#N/A#550000000021976460" rowActionAvailability="0 0 0">
<Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: OPAQUE PONY
BEADS 6X9MM (BAG OF 850) (BE9000)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">1313</Value>
</Row>
</Rows>
</Data>
</ReportDataResponse>
My DIH:
|<dataConfig>
<dataSource name="coremetrics"
type="URLDataSource"
encoding="UTF-8"
connectionTimeout="5000"
readTimeout="10000"/>
<document>
<entity name="coremetrics"
dataSource="coremetrics"
pk="id"
url="https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=******&username=****&format=XML&userAuthKey=****&language=en_US∓viewID=9475540&period_a=M20110930"
processor="XPathEntityProcessor"
stream="true"
forEach="/ReportDataResponse/Data/Rows/Row"
logLevel="fine"
transformer="RegexTransformer">
<field column="part_code" name="id"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']"
regex="/^PRODUCT:.*\((.*?)\)$/" replaceWith="$1"/>
<field column="page_views"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']" />
</entity>
</document>
</dataConfig>
|
|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath->new(filename => 'cm.xml');|
|||my $nodeset = $xp->find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset->get_nodelist) {|
|||my $page_name = $node->findvalue('Value[@columnId="PAGE_NAME"]');|
| my $page_views = $node->findvalue('Value[@columnId="PAGE_VIEWS"]');|
| $page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|
From logs:
INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=2
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource
getData
FINE: Accessing URL:
https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*****&username=***&format=XML&userAuthKey=******&language=en_US&viewID=9475540&period_a=M20110930
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1
Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1
Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2b,generation=83,filenames=[segments_2b]
commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2c,generation=84,filenames=[segments_2c]
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 84
Aug 24, 2012 3:53:28 PM org.apache.solr.search.SolrIndexSearcher<init>
INFO: Opening Searcher@ff33d42 main
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@ff33d42
main{StandardDirectoryReader(segments_2c:323)}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [ssww] Registered new searcher Searcher@ff33d42
main{StandardDirectoryReader(segments_2c:323)}
Aug 24, 2012 3:53:28 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:28 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter persist
INFO: Wrote last indexed time to dataimport.properties
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:0:17.918
Aug 24, 2012 3:53:28 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=2 {deleteByQuery=*:*,commit=} 0 2