Sorry, there was a typo. should read: The abstraction is getting difficult. Let me get a little more specific, Y is an industry code, there many of them. For each data row (which obvious has more that just 1 predicate) I have an industry code. My original thought was that I could have a prior based on the industry.
I could data have like: > A,solvent,dust,code=111222 > A,insecticide,code=111312 > … > B,solvent,diesel,code=111222 > ... > > The problem becomes that I am using the Industry distribution from my > training set, not the census. > > By the “best value” I mean when classifying an example that the model has not > seen before, I would like the model to classify based on the prior. If > p(A|Y)=0.8, select A with p=0.8. > > Dan > > On Feb 25, 2016, at 12:02 PM, Nishant Kelkar > <[email protected]<mailto:[email protected]>> wrote: > > I guess I don't quite understand then. So your training data is small, but > you have a potentially high cardinality feature Y from a separate source > (US Census)...how are you marrying them together then? As in, how does each > row in your small training set get a Y? Is, for example, X, a common column > between the two sets, where X --> Y is a one-to-many mapping? > > As far as using the information provided by Y, I think any model that > estimates a joint probability P(Y, X, label) will inadvertently end up > using information about P(label | Y), no? > > Also, what does your last line in your previously email mean ("If possible > I would like to use the best values available.")? > > Best, > Nishant > > On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] < > [email protected]<mailto:[email protected]>> wrote: > > Yes, but my training data is a small biased sample whereas feature “Y” are > population values (actually taken from the US Census, so a very large > sample). If possible I would like to use the best values available. > > > Daniel Russ, Ph.D. > Staff Scientist, Division of Computational Bioscience > Center for Information Technology > National Institutes of Health > U.S. Department of Health and Human Services > 12 South Drive > Bethesda, MD 20892-5624 > > On Feb 25, 2016, at 11:29 AM, Nishant Kelkar > <[email protected]<mailto:[email protected]> > <javascript:;><mailto:[email protected] <javascript:;>>> wrote: > > Hi Dan, > > Can't you call (A, Q) as A', (A,R) as A'', and so on...and just treat them > as separate labels altogether? Your classifier can then learn using these > "fake" labels. > > You can then have an in memory map of what each fake label (A'' for > example) corresponds to in reality (A'' in this case = (A, R)). > > Best Regards, > Nishant Kelkar > > On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] < > [email protected]<mailto:[email protected]> > <javascript:;><mailto:[email protected] <javascript:;>>> > wrote: > > I am not sure I understand. When I think of the kernel trick, I think of > converting a linear decision boundary into a higher order decision > boundary. (i.e. r<-x^2 + y^2 giving a circular decision boundary). Maybe > I am missing something? I’ll look into this a bit more. > Dan > > > On Feb 25, 2016, at 11:11 AM, Alexander Wallin < > [email protected]<mailto:[email protected]> > <javascript:;><mailto: > [email protected]<mailto:[email protected]> > <javascript:;>> <javascript:;>> wrote: > > Can’t you make a compounded feature (or features), i.e. use the kernel > trick? > > Alexander > > 25 feb. 2016 kl. 17:06 skrev Russ, Daniel (NIH/CIT) [E] < > [email protected]<mailto:[email protected]> > <javascript:;><mailto:[email protected] <javascript:;>> > <javascript:;>>: > > Hi, > Is it possible to change the prior based on a feature? > > For example, if I have the follow data (very simplified) > > Class, Predicates > > A, X > A, X > B, X > > You would expect class A 2/3 of the time when the feature is just > predicate X. > > However, lets say I know that another feature Y that can take values > {Q,R,S). P(A|Q)=0.8;P(A|R)=0.1;P(A|S)=0.3. > > Is there any way to add feature Y to the classifier taking advantage of > this information? > Thanks > Dan > > > > > > > >
