It clear that the xpaths provided won't fetch anything. because there
is no data in those paths. what do you really wish to be indexed ?



On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog <goks...@gmail.com> wrote:
> This DataImportHandler script does not find any documents in this HTML
> file. The DIH definitely opens the file, but the either the
> xpathprocessor gets no data or it does not recognize the xpaths
> described. Any hints? (I'm using Solr 1.5-dev, sometime recent.)
>
> Thanks!
>
> Lance
>
>
> xhtml-data-config.xml:
>
> <dataConfig>
>        <dataSource type="FileDataSource" encoding="UTF-8" />
>        <document>
>        <entity name="xhtml"
>                        forEach="/html/head | /html/body"
>                        processor="XPathEntityProcessor" pk="id"
>                        transformer="TemplateTransformer"
>                        url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"
>                        >
>                <field column="head_s" xpath="/html/head"/>
>                <field column="body_s" xpath="/html/body"/>
>        </entity>
>        </document>
> </dataConfig>
>
> Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <html >
>  <head >
>    <meta content="en-US" name="DC.language" />
>  </head>
>  <body>
>    <div id="header">
>     <a href="ch05-tokenizers-filters-Solr1.4.html">First</a>
>        <span class="nolink">Previous</span>
>        <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a>
>        <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a>
>    </div>
>    <div dir="ltr" id="content" style="background-color:transparent">
>      <h1 id="toc0">
>        <span class="SectionNumber">1</span>
>        <a id="RefHeading36402771"></a>
>        <a id="bkmRefHeading36402771"></a>
>        Understanding Analyzers, Tokenizers, and Filters
>      </h1>
>    </div>
>  </body>
> </html>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Reply via email to