Hi! I'm testing classification using CBayes (and Bayes) algorithm and I'm having issue when I try to classify a document with words (features) that don't exist in my model. Let's say I try to classify a document with a single non-existing word, it returns a constant (12.386649147018964) score for all labels instead of returning the unknown label.
After checking in the CBayesAlgorithm class, I made my own subclass and overrided the "featureWeight" function to return 0 if the weight of the feature in the curent label is 0 instead of returning the theta normalized weight. It fixed the problem in my case. My guess is that most classification examples are created with a quite big dataset (wikipedia, newsgroup) which includes a huge vocabulary. In my case, my dataset doesn't have a complete vocabulary causing problems with non existing words... Should I fill an issue? Is it a known / normal problem? Thanks! André-Philippe Paquet