Thank you for these suggestions. The real problem was incorrect syntax for the primary key column in data-config.xml. Once I corrected that, the data loaded fine.

wrong:

<field  column="part_code"  name="id"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']" 
regex="/^PRODUCT:.*\((.*?)\)$/"  replaceWith="$1"/>


Right:

<field  column="id"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']" 
regex="/^PRODUCT:.*\((.*?)\)$/"  replaceWith="$1"/>



On 08/25/2012 08:52 PM, Lance Norskog wrote:
About XPaths: the XPath engine does a limited range of xpaths. The doc
says that your paths are covered.

About logs: You only have the RegexTransformer listed. You need to add
LogTransformer to the transformer list:
http://wiki.apache.org/solr/DataImportHandler#LogTransformer

Having xml entity codes in the url string seems right. Can you verify
the url that goes to the remote site? Can you read the logs at the
remote site? Can you run this code through a proxy and watch the data?

On Fri, Aug 24, 2012 at 1:34 PM, Carrie Coy<c...@ssww.com>  wrote:
I'm trying to write a DIH to incorporate page view metrics from an XML feed
into our index.   The DIH makes a single request, and updates 0 documents.
I set log level to "finest" for the entire dataimport section, but I still
can't tell what's wrong.  I suspect the XPath.
http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport
returns 404.  Any suggestions on how I can debug this?

    *

      solr-spec
          4.0.0.2012.08.06.22.50.47


The XML data:

<?xml version='1.0' encoding='UTF-8'?>
<ReportDataResponse>
<Data>
<Rows>
<Row rowKey="P#PRODUCT: BURLAP POTATO SACKS  (PACK OF 12)
(W4537)#N/A#550000000016196614" rowActionAvailability="0 0 0">
<Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: BURLAP POTATO
SACKS  (PACK OF 12) (W4537)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">2388</Value>
</Row>
<Row rowKey="P#PRODUCT: OPAQUE PONY BEADS 6X9MM  (BAG OF 850)
(BE9000)#N/A#550000000021976460" rowActionAvailability="0 0 0">
<Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: OPAQUE PONY
BEADS 6X9MM  (BAG OF 850) (BE9000)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">1313</Value>
</Row>
</Rows>
</Data>
</ReportDataResponse>

My DIH:

|<dataConfig>
  <dataSource name="coremetrics"
              type="URLDataSource"
              encoding="UTF-8"
              connectionTimeout="5000"
              readTimeout="10000"/>

  <document>
         <entity  name="coremetrics"
             dataSource="coremetrics"
             pk="id"

url="https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=******&amp;username=****&amp;format=XML&amp;userAuthKey=****&amp;language=en_US&mp;viewID=9475540&amp;period_a=M20110930";
             processor="XPathEntityProcessor"
             stream="true"
             forEach="/ReportDataResponse/Data/Rows/Row"
             logLevel="fine"
             transformer="RegexTransformer">

             <field  column="part_code"  name="id"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']"
regex="/^PRODUCT:.*\((.*?)\)$/"  replaceWith="$1"/>
             <field  column="page_views"
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']"  />
        </entity>
  </document>
</dataConfig>
|

|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath->new(filename =>  'cm.xml');|
|||my $nodeset = $xp->find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset->get_nodelist) {|
|||my $page_name = $node->findvalue('Value[@columnId="PAGE_NAME"]');|
|    my $page_views = $node->findvalue('Value[@columnId="PAGE_VIEWS"]');|
|    $page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|

 From logs:

INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=2
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource
getData
FINE: Accessing URL:
https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*****&username=***&format=XML&userAuthKey=******&language=en_US&viewID=9475540&period_a=M20110930
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1
Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1
Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2b,generation=83,filenames=[segments_2b]

commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2c,generation=84,filenames=[segments_2c]
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 84
Aug 24, 2012 3:53:28 PM org.apache.solr.search.SolrIndexSearcher<init>
INFO: Opening Searcher@ff33d42 main
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@ff33d42
main{StandardDirectoryReader(segments_2c:323)}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [ssww] Registered new searcher Searcher@ff33d42
main{StandardDirectoryReader(segments_2c:323)}
Aug 24, 2012 3:53:28 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:28 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter persist
INFO: Wrote last indexed time to dataimport.properties
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:0:17.918
Aug 24, 2012 3:53:28 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=2 {deleteByQuery=*:*,commit=} 0 2



Reply via email to