This DataImportHandler script does not find any documents in this HTML file. The DIH definitely opens the file, but the either the xpathprocessor gets no data or it does not recognize the xpaths described. Any hints? (I'm using Solr 1.5-dev, sometime recent.)
Thanks! Lance xhtml-data-config.xml: <dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="xhtml" forEach="/html/head | /html/body" processor="XPathEntityProcessor" pk="id" transformer="TemplateTransformer" url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" > <field column="head_s" xpath="/html/head"/> <field column="body_s" xpath="/html/body"/> </entity> </document> </dataConfig> Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" <?xml version="1.0" encoding="UTF-8" ?> <html > <head > <meta content="en-US" name="DC.language" /> </head> <body> <div id="header"> <a href="ch05-tokenizers-filters-Solr1.4.html">First</a> <span class="nolink">Previous</span> <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a> <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a> </div> <div dir="ltr" id="content" style="background-color:transparent"> <h1 id="toc0"> <span class="SectionNumber">1</span> <a id="RefHeading36402771"></a> <a id="bkmRefHeading36402771"></a> Understanding Analyzers, Tokenizers, and Filters </h1> </div> </body> </html> -- Lance Norskog goks...@gmail.com