I'm currently running solr 4.0 final with manifoldcf 1.3 dev on tomcat 7. I need to capture the "h1" tags on each web page as that is the true "title" for the lack of a better word. I can't seem to get it to work at all. I read the instructions and used the capture component and then mapped it to a field named h1 in the schema. Here's my update/extract handler:
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.content">text</str> <str name="fmap.title">solr.title</str> <str name="fmap.name">solr.name</str> <str name="capture">h1</str> <str name="fmap.h1">h1</str> <str name="description">comments</str> <str name="fmap.Last-Modified">last_modified</str> <str name="uprefix">attr_</str> <str name="lowernames">true</str> </lst> Can anyone tell me what I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-capture-h1-tags-tp4072792.html Sent from the Solr - User mailing list archive at Nabble.com.