I guess I don't quite understand then. So your training data is small, but
you have a potentially high cardinality feature Y from a separate source
(US Census)...how are you marrying them together then? As in, how does each
row in your small training set get a Y? Is, for example, X, a common column
between the two sets, where X --> Y is a one-to-many mapping?
As far as using the information provided by Y, I think any model that
estimates a joint probability P(Y, X, label) will inadvertently end up
using information about P(label | Y), no?
Also, what does your last line in your previously email mean ("If possible
I would like to use the best values available.")?
Best,
Nishant
On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
[email protected]> wrote:
> Yes, but my training data is a small biased sample whereas feature “Y” are
> population values (actually taken from the US Census, so a very large
> sample). If possible I would like to use the best values available.
>
>
> Daniel Russ, Ph.D.
> Staff Scientist, Division of Computational Bioscience
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda, MD 20892-5624
>
> On Feb 25, 2016, at 11:29 AM, Nishant Kelkar <[email protected]
> <javascript:;><mailto:[email protected] <javascript:;>>> wrote:
>
> Hi Dan,
>
> Can't you call (A, Q) as A', (A,R) as A'', and so on...and just treat them
> as separate labels altogether? Your classifier can then learn using these
> "fake" labels.
>
> You can then have an in memory map of what each fake label (A'' for
> example) corresponds to in reality (A'' in this case = (A, R)).
>
> Best Regards,
> Nishant Kelkar
>
> On Thursday, February 25, 2016, Russ, Daniel (NIH/CIT) [E] <
> [email protected] <javascript:;><mailto:[email protected] <javascript:;>>>
> wrote:
>
> I am not sure I understand. When I think of the kernel trick, I think of
> converting a linear decision boundary into a higher order decision
> boundary. (i.e. r<-x^2 + y^2 giving a circular decision boundary). Maybe
> I am missing something? I’ll look into this a bit more.
> Dan
>
>
> On Feb 25, 2016, at 11:11 AM, Alexander Wallin <
> [email protected] <javascript:;><mailto:
> [email protected] <javascript:;>> <javascript:;>> wrote:
>
> Can’t you make a compounded feature (or features), i.e. use the kernel
> trick?
>
> Alexander
>
> 25 feb. 2016 kl. 17:06 skrev Russ, Daniel (NIH/CIT) [E] <
> [email protected] <javascript:;><mailto:[email protected] <javascript:;>>
> <javascript:;>>:
>
> Hi,
> Is it possible to change the prior based on a feature?
>
> For example, if I have the follow data (very simplified)
>
> Class, Predicates
>
> A, X
> A, X
> B, X
>
> You would expect class A 2/3 of the time when the feature is just
> predicate X.
>
> However, lets say I know that another feature Y that can take values
> {Q,R,S). P(A|Q)=0.8;P(A|R)=0.1;P(A|S)=0.3.
>
> Is there any way to add feature Y to the classifier taking advantage of
> this information?
> Thanks
> Dan
>
>
>
>
>
>
>