Hello Mike,

I think it should be working just fine with it enabled in
protocol.includes. You can check Nutch' parser output by using:
$ bin/nutch parsechecker <URL>

You should see one or more h# output fields present. You can then use the
index-metadata plugin to map the parser output fields to the indexer output
by setting the values for index.parse.md.

Regards,
Markus

Op ma 31 okt. 2022 om 04:51 schreef Mike <mz579...@gmail.com>:

> Hello!
>
> I've tried everything and set everything up and get the nutch headings
> plugin working:
>
> nutch-site.xml
>
> <property>protocol-okhttp
>   <name>
>
>
> <value>protocol-okhttp|...|parse-(html|tika|text|metatags)|index-(basic|anchor|more|metadata)|...|headings|nutch-extensionpoints</value>
> </property>
>
> schema.xml
>
>
> <!-- fields for the headings plugin -->
> <field name="h1" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h2" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h3" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h4" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h5" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h6" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
>
> index-writers.xml
>   <mapping>
>       <rename>
>         <field source="metatag.h1" dest="h1"/>
>         <field source="metatag.h2" dest="h2"/>
>         <field source="metatag.h3" dest="h3"/>
>         <field source="metatag.h4" dest="h4"/>
>         <field source="metatag.h5" dest="h5"/>
>         <field source="metatag.h6" dest="h6"/>
>       </rename>
> ...
>
> After indexing to solr there are no HTML headings tags in my solr index,
> what's missing?
>
> thanks!
>

Reply via email to