The abstraction is getting difficult. Let me get a little more specific, Y is 
an industry code, there many of them.  For each data row (which obvious has 
more that just 1 predicate) I have an industry code.  My original thought was 
that I could have a prior based on the industry. I could data like:

A,solvent,dust,code=111222
A,insecticide,code=111312
…
B,solvent,diesel,code=111222
...

The problem becomes that I am using the Industry distribution from my training 
set, not the census.

By the “best value” I mean when classifying an example that the model has not 
seen before, I would like the model to classify based on the prior.  If 
p(A|Y)=0.8,  select A with p=0.8.

Dan

On Feb 25, 2016, at 12:02 PM, Nishant Kelkar 
<[email protected]<mailto:[email protected]>> wrote:

I guess I don't quite understand then. So your training data is small, but
you have a potentially high cardinality feature Y from a separate source
(US Census)...how are you marrying them together then? As in, how does each
row in your small training set get a Y? Is, for example, X, a common column
between the two sets, where X --> Y is a one-to-many mapping?

As far as using the information provided by Y, I think any model that
estimates a joint probability P(Y, X, label) will inadvertently end up
using information about P(label | Y), no?

Also, what does your last line in your previously email mean ("If possible
I would like to use the best values available.")?

Best,
Nishant

On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
[email protected]<mailto:[email protected]>> wrote:

Yes, but my training data is a small biased sample whereas feature “Y” are
population values (actually taken from the US Census, so a very large
sample).  If possible I would like to use the best values available.


Daniel Russ, Ph.D.
Staff Scientist, Division of Computational Bioscience
Center for Information Technology
National Institutes of Health
U.S. Department of Health and Human Services
12 South Drive
Bethesda,  MD 20892-5624

On Feb 25, 2016, at 11:29 AM, Nishant Kelkar 
<[email protected]<mailto:[email protected]>
<javascript:;><mailto:[email protected] <javascript:;>>> wrote:

Hi Dan,

Can't you call (A, Q) as A', (A,R) as A'', and so on...and just treat them
as separate labels altogether? Your classifier can then learn using these
"fake" labels.

You can then have an in memory map of what each fake label (A'' for
example) corresponds to in reality (A'' in this case = (A, R)).

Best Regards,
Nishant Kelkar

On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
[email protected]<mailto:[email protected]> 
<javascript:;><mailto:[email protected] <javascript:;>>>
wrote:

I am not sure I understand.  When I think of the kernel trick, I think of
converting a linear decision boundary into a higher order decision
boundary.  (i.e. r<-x^2 + y^2 giving a circular decision boundary).  Maybe
I am missing something?  I’ll look into this a bit more.
Dan


On Feb 25, 2016, at 11:11 AM, Alexander Wallin <
[email protected]<mailto:[email protected]> 
<javascript:;><mailto:
[email protected]<mailto:[email protected]> 
<javascript:;>> <javascript:;>> wrote:

Can’t you make a compounded feature (or features), i.e. use the kernel
trick?

Alexander

25 feb. 2016 kl. 17:06 skrev Russ, Daniel (NIH/CIT) [E] <
[email protected]<mailto:[email protected]> 
<javascript:;><mailto:[email protected] <javascript:;>>
<javascript:;>>:

Hi,
Is it possible to change the prior based on a feature?

For example, if I have the follow data (very simplified)

Class, Predicates

A, X
A, X
B, X

You would expect class A 2/3 of the time when the feature is just
predicate X.

However, lets say I know that another feature Y that can take values
{Q,R,S). P(A|Q)=0.8;P(A|R)=0.1;P(A|S)=0.3.

Is there any way to add feature Y to the classifier taking advantage of
this information?
Thanks
Dan








Reply via email to