Hi everyone I am using the UIMA DictionaryAnnotator to tag Solr documents. It seems to be working (I do get tags), but I get some strange behavior:
1. I am using the White Space Tokenizer both for the indexed text and for creating the dictionary. Most entries in my dictionary consist of multiple words. From the documentation, it seems that with the default settings, a document must contain all words in order to match the dictionary entry. However, this is not the case in practice. I'm seeing documents being randomly tagged with single words, although my dictionary does not contain an entry for those single words (they only appear as part of multi word entries). This would be fine (even preferable), if it were consistent. But it is not. The tagging happens only for a subset of single words, not for all. What am I doing wrong? 2. If a dictionary word appears multiple times in the analyzed field, it is also added just as many times to the mapped field (i.e. my tags). Is there a way to control/disable this? Thanks! Regards Andreea -- View this message in context: http://lucene.472066.n3.nabble.com/solr-and-uima-dictionary-annotator-tp4208359.html Sent from the Solr - User mailing list archive at Nabble.com.