Re: Boundary Values for Training Data

Zach Richardson Mon, 26 Sep 2011 08:11:55 -0700

Em,

This really all depends on your goal.  Do you want them to be scored as
interesting to an individual or do you want them categorized into topics?


How you set those problems up can be very different based on the end goal.
 What is yours?

Thanks,

Zach


On Mon, Sep 26, 2011 at 9:55 AM, Em <mailformailingli...@yahoo.de> wrote:

> No experiences?
>
> Regards,
> Em
>
> Am 23.09.2011 12:48, schrieb Em:
> > Hello list,
> >
> > let's say I want to classifiy documents and there are two possible
> outcomes:
> > Yes, the document belongs to the topic I focus on, or No, it doesn't.
> >
> > The topic is for example: Machine Learning.
> >
> > Doc1: A sub-chapter of the book "Mahout in Action"
> > Doc2: A paper about clustering-techniques
> > Doc3: A Blog-Post of Ted Dunning, Machine-Learning-Expert, talking about
> > his opinion regarding the relationship between Google and Oracle
> > Doc4: Ted Dunning is talking about how to cook tasty spagetti (Sorry
> > Ted, you are my guinea pig in this case)
> >
> > The point is: Doc3 is not really about Machine Learning, however it
> > might be relevant for people that are interested in Machine Learning,
> > since the author is a Machine-Learning-Expert and his opinion might
> > reflect some thoughts regarding that domain.
> >
> > Doc4 is completely irrelevant. It has to do with Ted Dunning, but not
> > with Machine Learning nor software at all. The only exception would be
> > if Ted wrote a piece of Machine Learning software that is creating a
> > recipe for cooking tasty spagetti ;).
> >
> > If I change the topic to something like "Star Trek":
> >
> > Doc1: A review of a Star Trek movie
> > Doc2: A Star Trek computer game's description
> > Doc3: A review regarding a PlayStation 3 Star Trek game
> > Doc4: The announcement that the gaming studio of the Star Trek games is
> > going to create a new Star Wars game
> > Doc5: A Star Wars book's description
> > Doc6: The gaming studio of the Star Trek games is going to create a need
> > for speed clone
> >
> > Doc 1,2 and 3 are relevant for Trekkies. Doc 4 might be as well, because
> > the studio is an authority for creating good Star Trek games and they
> > noted that their experiences with Star Trek will help them building a
> > good Star Wars game. Some fans might be interested in this.
> >
> > However doc 5 is completely irrelevant, since it has nothing to do with
> > Star Trek.
> > Doc 6 is about an authority in the Star Trek merchandise-industry but it
> > correlates with my Ted-cooks-spagetti example from my first example -
> > Doc 6 is irrelevant.
> >
> > Doc3 of my "Machine Learning" example and Doc 4 of my "Star Trek" one
> > are boundary values for beeing relevant. They might interest people that
> > focus on the two named domains, but they sail very close to the wind.
> >
> > Does it generally make sense to take such examples into account for
> > training a model? Real humans may have a discussion about those examples
> > whether they really belong to the domain they want to focus on.
> >
> > Thank you for your advice.
> >
> > Regards,
> > Em
>



-- 
Zach Richardson
Ravel, Co-founder
Austin, TX
z...@raveldata.com
512.825.6031

Re: Boundary Values for Training Data

Reply via email to