I am not sure that will work. My concern is that if you have 3 labels, with NO information you would guess a class with p=1/3. so if p<1/3 there is evidence AGAINST the label but...
case 1: p(L1) = .7, p(L2)=.2 p(L3)=.1 does this mean L1 only or L1 and L2 or L1+L2+L3 note that the sum of L1+L2 =0.9, but p(L2)< 1/3 case 2: P(L1)=0.6,p(L2)=.3,p(L3)=0.1 L2 is still < 1/3 , sum of L1+L2=0.9 case 2 is more plausible to me that case 1. Maybe you have to require p(L2) to be greater than 0.33. Another thing to keep in mind is that if the number of labels gets large, the values of p will go down (you need to have probability mass for the other outcomes) here is a paper on the topic. (My advice earlier turns our is mentioned in the introduction as a naive solution) Multi-labelled Classification Using Maximum Entropy ... - CiteSeerX <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.399.2443&rep=rep1&type=pdf> Daniel > On Apr 12, 2018, at 11:09 PM, Benedict Holland <[email protected]> > wrote: > > I can actually live with a Pr() >> 0 to matching labels, maybe. What might > be a reasonable option is to specify a sum of probabilities to get over a > certain margin. Like, sum the probabilities by order, and select the top > few that sum over a threshold. That could actually work. > > ~Ben > > On Thu, Apr 12, 2018 at 10:26 PM, <[email protected]> wrote: > >> Hi Ben, >> >> if a document that can be in multiple categories, you should see it >> reflected in the probabilities. The top categories will be close in >> score. It will not be 1/m because that would imply that ALL categories are >> “equally probable” or you have no idea. However, if you have 3 classes and >> two are likely, it may be 0.49,0.49,0.02. Remember that the results are >> normalized with by a softmax at the end. So the sum of all probabilities >> will be always 1. >> Sorry, but multi-class classification is more complicated that binary >> classification. If you really are interested in multi-label >> classification, I’m not sure maxent (at least the way openNLP formulated >> the solution) is appropriate for your needs. You might want to consider >> individual binary classifiers for each label. Have 1 model for each label: >> >> train_cat1.txt... >> cat_1_TRUE <text> >> cat_1_FALSE <text> >> … >> >> train_cat2.txt… >> cat_2_FALSE <text> >> cat_2_TRUE <text> >> >> Hope it helps, Let me know what you wind up doing... >> Daniel >> >>> On Apr 12, 2018, at 4:22 PM, Benedict Holland < >> [email protected]> wrote: >>> >>> Hello all, >>> >>> I understand that maximum entropy models are excellent at categorizing >>> documents. As it turns out, I have a situation where 1 document can be in >>> many categories (1:m relationship). I believe that I could create >> training >>> data that looks something like: >>> >>> category_1 <text> >>> category_2 <text> >>> ... >>> >>> If I do this, will the resulting probability model return category >>> probabilities as Pr(<text> in category_m) = 1/m for all categories m or >> it >>> return Pr(<text> in category_m) = 1 for all categories m? >>> >>> This is a very important distinction. I really hope it is the later. If >> it >>> isn't, do you have a way to make sure that if I receive a text that is >>> similar to the training data, I can get a probability close to 1 if it >> fits >>> into multiple categories? >>> >>> Thanks, >>> ~Ben >> >>
