For general documentation purposes, I learned that to activate the language identification plugin, you need to add it to nutch-site.xml in the plugin.includes property, and most importantly, it requires a dash in the name: not languageidentifier (as I first tried) but language-identifier (see below).

<property>
  <name>plugin.includes</name>

<value>language-identifier|protocol-http|urlfilter-(regex)|parse-(text|html|js|tika)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>...
  </description>
</property>

Reply via email to