Hi Abhishek,
You need to build up your metric for probability first.
For e.g.,
1. keywords occurrence/total words count
2. Keywords occurrence/total sentences
3. the number of files who contain keyword / total files number
Best Regards,
James Fang
-邮件原件-
发件人:
Hi,
If I have a large corpus of text documents and I need to find the
probability of occurence of a phrase like I am in the given set of
text documents, how do I go about finding the value?
I can very well search how many time does the phrase I am occurs in
the whole set of text documents
Thanks James. I was thinking on the same lines too.
I guess I have some homework to be done on this regard :)
With Regards,
Abhishek S
On Dec 3, 1:27 pm, James Fang [EMAIL PROTECTED] wrote:
Hi Abhishek,
You need to build up your metric for probability first.
For e.g.,
1.
No problem man.
I wonder what's the user senario of your probability.One document or
multiple documents are not the matter, because u can actually combine their
stastistics together.
Best Regards,
James Fang
-Original Mail-
发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED]
You needn't brute force the possible combinations.
1) if the number is odd, the lcm = ground(number/2) * (ground(number/
2)+1)
2) if the number is even, and number/2 is still even . the lcm =
(number/2-1)* (number/2+1)
3) if the number is even, and number/2 is odd. then lcm = (number/
Hi,
I just came across an idea wherein they wanted to find out how
frequently a particular phrase occurs in a set of documents.
So they refer to the probability of that phrase coming in a set of
documents put together.
I was just wondering how they find the probability of the phrase in
the