Re: Boundary Values for Training Data

2011-09-26 Thread Em
No experiences? Regards, Em Am 23.09.2011 12:48, schrieb Em: Hello list, let's say I want to classifiy documents and there are two possible outcomes: Yes, the document belongs to the topic I focus on, or No, it doesn't. The topic is for example: Machine Learning. Doc1: A sub-chapter

Re: Boundary Values for Training Data

2011-09-26 Thread Em
Zach, thanks for your feedback! I want to categorize them into a general-purpose category (nothing individual). The goal is to get an overview about every document that has to do with the domain in some way and to throw away everything else. Regards, Em Am 26.09.2011 17:11, schrieb Zach

Boundary Values for Training Data

2011-09-23 Thread Em
to focus on. Thank you for your advice. Regards, Em

Re: Plagiarism - document similarity

2011-07-12 Thread Em
for you ;). Regards, Em Am 12.07.2011 09:58, schrieb Luca Natti: Thanks to all , i need to start from the beginning theory , you are speaking arab :) to me, or in other words i need a less theoretical approach, or in other words some real code to put my hands on. Excuse this raw approach but i

Re: Plagiarism - document similarity

2011-07-11 Thread Em
the author has rewritten a part of another researcher's work. This way you are able to find out phrases where the longest-common-subsequence is small but a human would see the similarities between both documents and the possiblity of a plagiarism. Regards, Em Am 11.07.2011 09:15, schrieb Luca Natti

Re: Exclude by RuleSet

2011-07-04 Thread Em
Hi Marco, thank you for pointing me to this direction. Again I have to ask: What would be more efficient? Rescoring or CandidateItemStrategy? Where are the differences? Thanks! Am 04.07.2011 12:39, schrieb Marko Ciric: Hi Em, If I understood well what you're asking, you could implement

Re: Exclude by RuleSet

2011-07-02 Thread Em
are filtered out by a Rescorer? Regards, Em Am 02.07.2011 15:22, schrieb Steven Bourke: Assuming you have the technical resources, one approach could involve just containing different 'conditions' into different data models. For instance I have one setup that only has users from someones social

Exclude by RuleSet

2011-07-01 Thread Em
that are definitly unwanted for the resultset. What I want is something like a SELECT col1, col2, col3 FROM myData WHERE category = women OR category = subcategoryOfWomen and than do the computation on top of this dataset. Is this possible with Mahout? Regards, Em -- View this message in context: http

Re: Exclude by RuleSet

2011-07-01 Thread Em
for the problem? Could you explain more of the tradeoffs for both implementation-possibilities, please? Regards, Em Am 01.07.2011 19:01, schrieb Sean Owen: The short answer is that you'd have to modify the code to inject this kind of logic -- though you might get away with just using a custom

Re: Beginner's Question: What is a feature?

2011-05-22 Thread Em
formula or something like that for the algorithms that are part of mahout. This would make understanding the different parameters more easy, I think. That's what I ment. Hopefully my explanation is better now? Thank you, Em Am 22.05.2011 18:15, schrieb Jeremy Lewi: Em, Typically in machine

Re: Beginner's Question: What is a feature?

2011-05-22 Thread Em
Thank you Ted, your explanations really helped. Regards, Em Am 22.05.2011 19:43, schrieb Ted Dunning: On Sun, May 22, 2011 at 10:32 AM, Em mailformailingli...@yahoo.de wrote: So, let's say I got a descriptional-text of 100-200 words (text-like). Does this mean that I got one feature