This DataImportHandler script does not find any documents in this HTML
file. The DIH definitely opens the file, but the either the
xpathprocessor gets no data or it does not recognize the xpaths
described. Any hints? (I'm using Solr 1.5-dev, sometime recent.)

Thanks!

Lance


xhtml-data-config.xml:

<dataConfig>
        <dataSource type="FileDataSource" encoding="UTF-8" />
        <document>
        <entity name="xhtml"
                        forEach="/html/head | /html/body"
                        processor="XPathEntityProcessor" pk="id"
                        transformer="TemplateTransformer"
                        url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"
                        >
                <field column="head_s" xpath="/html/head"/>
                <field column="body_s" xpath="/html/body"/>
        </entity>
        </document>
</dataConfig>

Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"

<?xml version="1.0" encoding="UTF-8" ?>
<html >
  <head >
    <meta content="en-US" name="DC.language" />
  </head>
  <body>
    <div id="header">
     <a href="ch05-tokenizers-filters-Solr1.4.html">First</a>
        <span class="nolink">Previous</span>
        <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a>
        <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a>
    </div>
    <div dir="ltr" id="content" style="background-color:transparent">
      <h1 id="toc0">
        <span class="SectionNumber">1</span>
        <a id="RefHeading36402771"></a>
        <a id="bkmRefHeading36402771"></a>
        Understanding Analyzers, Tokenizers, and Filters
      </h1>
    </div>
  </body>
</html>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to