Yes, sorry, I also forgot to post this setting:

<property>
   <name>index.parse.md</name>

 
<value>metatag.description,metatag.keywords,metatag.rating,metatag.h1,metatag.h2,metatag.h3,metatag.h4,metatag.h5,metatag.h6</value>
   <description>
   Comma-separated list of keys to be taken from the parse metadata to
generate fields.
   Can be used e.g. for 'description' or 'keywords' provided that these
values are generated
   by a parser (see parse-metatags plugin)
   </description>
</property>

The Nutch parsechecker shows me the fields but the indexchecker doesn't.

Am Mo., 31. Okt. 2022 um 04:51 Uhr schrieb Mike <mz579...@gmail.com>:

> Hello!
>
> I've tried everything and set everything up and get the nutch headings
> plugin working:
>
> nutch-site.xml
>
> <property>protocol-okhttp
>   <name>
>
> <value>protocol-okhttp|...|parse-(html|tika|text|metatags)|index-(basic|anchor|more|metadata)|...|headings|nutch-extensionpoints</value>
> </property>
>
> schema.xml
>
>
> <!-- fields for the headings plugin -->
> <field name="h1" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h2" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h3" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h4" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h5" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
> <field name="h6" type="text_general" stored="true" indexed="true"
> multiValued="true"/>
>
> index-writers.xml
>   <mapping>
>       <rename>
>         <field source="metatag.h1" dest="h1"/>
>         <field source="metatag.h2" dest="h2"/>
>         <field source="metatag.h3" dest="h3"/>
>         <field source="metatag.h4" dest="h4"/>
>         <field source="metatag.h5" dest="h5"/>
>         <field source="metatag.h6" dest="h6"/>
>       </rename>
> ...
>
> After indexing to solr there are no HTML headings tags in my solr index,
> what's missing?
>
> thanks!
>

Reply via email to