Re: classification with extra information

Russ, Daniel (NIH/CIT) [E] Thu, 25 Feb 2016 09:30:42 -0800

Sorry, there was a typo.  should read:

The abstraction is getting difficult. Let me get a little more specific, Y is 
an industry code, there many of them.  For each data row (which obvious has 
more that just 1 predicate) I have an industry code.  My original thought was 
that I could have a prior based on the industry.


I could data have like:

> A,solvent,dust,code=111222
> A,insecticide,code=111312
> …
> B,solvent,diesel,code=111222
> ...
> 
> The problem becomes that I am using the Industry distribution from my 
> training set, not the census.
> 
> By the “best value” I mean when classifying an example that the model has not 
> seen before, I would like the model to classify based on the prior.  If 
> p(A|Y)=0.8,  select A with p=0.8.
> 
> Dan
> 
> On Feb 25, 2016, at 12:02 PM, Nishant Kelkar 
> <[email protected]<mailto:[email protected]>> wrote:
> 
> I guess I don't quite understand then. So your training data is small, but
> you have a potentially high cardinality feature Y from a separate source
> (US Census)...how are you marrying them together then? As in, how does each
> row in your small training set get a Y? Is, for example, X, a common column
> between the two sets, where X --> Y is a one-to-many mapping?
> 
> As far as using the information provided by Y, I think any model that
> estimates a joint probability P(Y, X, label) will inadvertently end up
> using information about P(label | Y), no?
> 
> Also, what does your last line in your previously email mean ("If possible
> I would like to use the best values available.")?
> 
> Best,
> Nishant
> 
> On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
> [email protected]<mailto:[email protected]>> wrote:
> 
> Yes, but my training data is a small biased sample whereas feature “Y” are
> population values (actually taken from the US Census, so a very large
> sample).  If possible I would like to use the best values available.
> 
> 
> Daniel Russ, Ph.D.
> Staff Scientist, Division of Computational Bioscience
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda,  MD 20892-5624
> 
> On Feb 25, 2016, at 11:29 AM, Nishant Kelkar 
> <[email protected]<mailto:[email protected]>
> <javascript:;><mailto:[email protected] <javascript:;>>> wrote:
> 
> Hi Dan,
> 
> Can't you call (A, Q) as A', (A,R) as A'', and so on...and just treat them
> as separate labels altogether? Your classifier can then learn using these
> "fake" labels.
> 
> You can then have an in memory map of what each fake label (A'' for
> example) corresponds to in reality (A'' in this case = (A, R)).
> 
> Best Regards,
> Nishant Kelkar
> 
> On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
> [email protected]<mailto:[email protected]> 
> <javascript:;><mailto:[email protected] <javascript:;>>>
> wrote:
> 
> I am not sure I understand.  When I think of the kernel trick, I think of
> converting a linear decision boundary into a higher order decision
> boundary.  (i.e. r<-x^2 + y^2 giving a circular decision boundary).  Maybe
> I am missing something?  I’ll look into this a bit more.
> Dan
> 
> 
> On Feb 25, 2016, at 11:11 AM, Alexander Wallin <
> [email protected]<mailto:[email protected]> 
> <javascript:;><mailto:
> [email protected]<mailto:[email protected]> 
> <javascript:;>> <javascript:;>> wrote:
> 
> Can’t you make a compounded feature (or features), i.e. use the kernel
> trick?
> 
> Alexander
> 
> 25 feb. 2016 kl. 17:06 skrev Russ, Daniel (NIH/CIT) [E] <
> [email protected]<mailto:[email protected]> 
> <javascript:;><mailto:[email protected] <javascript:;>>
> <javascript:;>>:
> 
> Hi,
> Is it possible to change the prior based on a feature?
> 
> For example, if I have the follow data (very simplified)
> 
> Class, Predicates
> 
> A, X
> A, X
> B, X
> 
> You would expect class A 2/3 of the time when the feature is just
> predicate X.
> 
> However, lets say I know that another feature Y that can take values
> {Q,R,S). P(A|Q)=0.8;P(A|R)=0.1;P(A|S)=0.3.
> 
> Is there any way to add feature Y to the classifier taking advantage of
> this information?
> Thanks
> Dan
> 
> 
> 
> 
> 
> 
> 
>

Re: classification with extra information

Reply via email to