Hi, I am stuck trying to index only the nouns of german and english texts. (very similar to http://wiki.apache.org/solr/OpenNLP#Full_Example)
First try was to use UIMA with the HMMTagger: <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"></lst> <str name="analysisEngine">/org/apache/uima/desc/AggregateSentenceAE.xml</str> <bool name="ignoreErrors">false</bool> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"><str>albody</str></arr> </lst> <lst name="fieldMappings"> <lst name="type"> <str name="name">org.apache.uima.SentenceAnnotation</str> <lst name="mapping"> <str name="feature">coveredText</str> <str name="field">albody2</str> </lst> </lst> </lst> </lst> </processor> - But how do I set the ModelFile to use the german corpus? - What about language identification? -- How do I use the right corpus/tagger based on the language? -- Should this be done in UIMA (how?) or via solr contrib/langid field mapping? - How to remove non nouns in the annotated field? Second try is to use OpenNLP and to apply the patch https://issues.apache.org/jira/browse/LUCENE-2899 But the patch seems to be a bit out of date. Currently I try to get it to work with solr 4.1. Any pointers appreciated :-) Regards, Kai Gülzau