No experiences? Regards, Em
Am 23.09.2011 12:48, schrieb Em: > Hello list, > > let's say I want to classifiy documents and there are two possible outcomes: > Yes, the document belongs to the topic I focus on, or No, it doesn't. > > The topic is for example: Machine Learning. > > Doc1: A sub-chapter of the book "Mahout in Action" > Doc2: A paper about clustering-techniques > Doc3: A Blog-Post of Ted Dunning, Machine-Learning-Expert, talking about > his opinion regarding the relationship between Google and Oracle > Doc4: Ted Dunning is talking about how to cook tasty spagetti (Sorry > Ted, you are my guinea pig in this case) > > The point is: Doc3 is not really about Machine Learning, however it > might be relevant for people that are interested in Machine Learning, > since the author is a Machine-Learning-Expert and his opinion might > reflect some thoughts regarding that domain. > > Doc4 is completely irrelevant. It has to do with Ted Dunning, but not > with Machine Learning nor software at all. The only exception would be > if Ted wrote a piece of Machine Learning software that is creating a > recipe for cooking tasty spagetti ;). > > If I change the topic to something like "Star Trek": > > Doc1: A review of a Star Trek movie > Doc2: A Star Trek computer game's description > Doc3: A review regarding a PlayStation 3 Star Trek game > Doc4: The announcement that the gaming studio of the Star Trek games is > going to create a new Star Wars game > Doc5: A Star Wars book's description > Doc6: The gaming studio of the Star Trek games is going to create a need > for speed clone > > Doc 1,2 and 3 are relevant for Trekkies. Doc 4 might be as well, because > the studio is an authority for creating good Star Trek games and they > noted that their experiences with Star Trek will help them building a > good Star Wars game. Some fans might be interested in this. > > However doc 5 is completely irrelevant, since it has nothing to do with > Star Trek. > Doc 6 is about an authority in the Star Trek merchandise-industry but it > correlates with my Ted-cooks-spagetti example from my first example - > Doc 6 is irrelevant. > > Doc3 of my "Machine Learning" example and Doc 4 of my "Star Trek" one > are boundary values for beeing relevant. They might interest people that > focus on the two named domains, but they sail very close to the wind. > > Does it generally make sense to take such examples into account for > training a model? Real humans may have a discussion about those examples > whether they really belong to the domain they want to focus on. > > Thank you for your advice. > > Regards, > Em