I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7.
I need to capture the "h1" tags on each web page as that is the true "title"
for the lack of a better word.
I can't seem to get it to work at all. 
I read the instructions and used the capture component and then mapped it to
a field named h1 in the schema.
Here's my update/extract handler:

<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
          <str name="fmap.content">text</str>
          <str name="fmap.title">solr.title</str>
          <str name="fmap.name">solr.name</str>
          <str name="capture">h1</str>
          <str name="fmap.h1">h1</str>
          
          <str name="description">comments</str>
          
      <str name="fmap.Last-Modified">last_modified</str>
      <str name="uprefix">attr_</str>
          <str name="lowernames">true</str>
          
    </lst>
Can anyone tell me what I doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to