[algogeeks] Re: Probability of a phrase in a text document?

2007-12-03 Thread James Fang
Hi Abhishek, You need to build up your metric for probability first. For e.g., 1. keywords occurrence/total words count 2. Keywords occurrence/total sentences 3. the number of files who contain keyword / total files number Best Regards, James Fang -邮件原件- 发件人:

[algogeeks] Probability of a phrase in a text document?

2007-12-03 Thread Abhishek
Hi, If I have a large corpus of text documents and I need to find the probability of occurence of a phrase like I am in the given set of text documents, how do I go about finding the value? I can very well search how many time does the phrase I am occurs in the whole set of text documents

[algogeeks] Re: Probability of a phrase in a text document?

2007-12-03 Thread Abhishek
Thanks James. I was thinking on the same lines too. I guess I have some homework to be done on this regard :) With Regards, Abhishek S On Dec 3, 1:27 pm, James Fang [EMAIL PROTECTED] wrote: Hi Abhishek, You need to build up your metric for probability first. For e.g., 1.

[algogeeks] Re: Probability of a phrase in a text document?

2007-12-03 Thread James Fang
No problem man. I wonder what's the user senario of your probability.One document or multiple documents are not the matter, because u can actually combine their stastistics together. Best Regards, James Fang -Original Mail- 发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED]

[algogeeks] Re: highest lcm

2007-12-03 Thread James Fang
You needn't brute force the possible combinations. 1) if the number is odd, the lcm = ground(number/2) * (ground(number/ 2)+1) 2) if the number is even, and number/2 is still even . the lcm = (number/2-1)* (number/2+1) 3) if the number is even, and number/2 is odd. then lcm = (number/

[algogeeks] Re: Probability of a phrase in a text document?

2007-12-03 Thread Abhishek
Hi, I just came across an idea wherein they wanted to find out how frequently a particular phrase occurs in a set of documents. So they refer to the probability of that phrase coming in a set of documents put together. I was just wondering how they find the probability of the phrase in the