It clear that the xpaths provided won't fetch anything. because there is no data in those paths. what do you really wish to be indexed ?
On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog <goks...@gmail.com> wrote: > This DataImportHandler script does not find any documents in this HTML > file. The DIH definitely opens the file, but the either the > xpathprocessor gets no data or it does not recognize the xpaths > described. Any hints? (I'm using Solr 1.5-dev, sometime recent.) > > Thanks! > > Lance > > > xhtml-data-config.xml: > > <dataConfig> > <dataSource type="FileDataSource" encoding="UTF-8" /> > <document> > <entity name="xhtml" > forEach="/html/head | /html/body" > processor="XPathEntityProcessor" pk="id" > transformer="TemplateTransformer" > url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" > > > <field column="head_s" xpath="/html/head"/> > <field column="body_s" xpath="/html/body"/> > </entity> > </document> > </dataConfig> > > Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" > > <?xml version="1.0" encoding="UTF-8" ?> > <html > > <head > > <meta content="en-US" name="DC.language" /> > </head> > <body> > <div id="header"> > <a href="ch05-tokenizers-filters-Solr1.4.html">First</a> > <span class="nolink">Previous</span> > <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a> > <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a> > </div> > <div dir="ltr" id="content" style="background-color:transparent"> > <h1 id="toc0"> > <span class="SectionNumber">1</span> > <a id="RefHeading36402771"></a> > <a id="bkmRefHeading36402771"></a> > Understanding Analyzers, Tokenizers, and Filters > </h1> > </div> > </body> > </html> > > > > -- > Lance Norskog > goks...@gmail.com > -- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com