You'd better wite a program.
On 11 окт, 10:42, Vaibhav Jain [EMAIL PROTECTED] wrote:
Algo:
1. initialize final_result array with whole sequence and count number of
keywords in no_of_keys and initialize counter array for keywords with value
zero.
2. if sequence_ptr is not null then start
You'd better write a program.
On Oct 11, 10:42 am, Vaibhav Jain [EMAIL PROTECTED] wrote:
Algo:
1. initialize final_result array with whole sequence and count number of
keywords in no_of_keys and initialize counter array for keywords with value
zero.
2. if sequence_ptr is not null then
Algo:
1. initialize final_result array with whole sequence and count number of
keywords in no_of_keys and initialize counter array for keywords with value
zero.
2. if sequence_ptr is not null then start scanning the sequence if
keyword_matches() in sequence put into temp_array and update pointer
No, I am not trying to do that at all. The trie is built to contain
only keywords. It can then be used to answer the question for the
current character 'Is this character part of a candidate keyword?' and
do this O(1). Candidate keywords are identified initially by the trei
root returning a
No, I am not trying to do that at all. The trie is built to contain
only keywords. It can then be used to answer the question for the
current character 'Is this character part of a candidate keyword?' and
do this O(1). Candidate keywords are identified initially by the trei
root returning a
I just wanted to see the trei.
Try Suffix Tries ( FYI : reTRIEval )
-vEnKAt
--
Blog @ http://blizzardzblogs.blogspot.com
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this
Hi Andrey,
On Oct 8, 7:56 pm you wrote:
... Enumerating of the words makes no sense.
Agreed.
... As Vishal suggested a trie looks more realistic. Building the
trie can be done O(m), with m - total characters in keywords.
Identifying whether a document character is part of a keyword
I have to admit that I was wrong in my previous post. I stated that if
we have all words in the enumerated we can operate with them better
(faster) but it is true. Enumeraing of the words makes no sence..
Similar objections to using a hash table to assign integers to words.
If collisions are
reposting since it didn't appear yesterday, apologies
A small variation of Vishal's algorithm to eliminate the need of the
bitmap. I use a hash table of integers index by the keyword,
initialized to all 0. I also make use of the property that in the
shortest summary the first and the last
On 1 окт, 06:20, Sticker [EMAIL PROTECTED] wrote:
En, it is the idea. You assume each keyword is a single character and
you use a map to hash key words and their counts. Each time you extend
the range to right hand side, you may increase the counts of some
found key words and each time you
Check this program:
#include map
#include string
#include iostream
#include algorithm
#include iterator
using namespace std;
class Range : public pairconst char*, const char* {
public:
size_t size() const { return second - first; }
Range(const char* f=0, const char * s=0)
En, it is the idea. You assume each keyword is a single character and
you use a map to hash key words and their counts. Each time you extend
the range to right hand side, you may increase the counts of some
found key words and each time you shrink the range from the left hand
side, you decrease
We might even use String Trie. The search time would be O(m) where m is the
length of maximum length keyword. Since mN (normally), it would be as good
as O(1).
This would of course require preprocessing. Again, I am assuming no
constraint on space.
On 9/25/07, Sticker [EMAIL PROTECTED] wrote:
To Vishal: My idea is similar to yours. I like to use hash table as
well. But I wonder which hash function can you use to insert and find
keywords with O(1) time? Keywords are not single characters. They are
normal words. That's basically what I am aftering.
On Sep 25, 2:11 pm, Mayur [EMAIL
the problem is you need a hash table to maintain all the keywords,:)
because they do not have to be a single characters, or you can store them in
array, but then you need binary search,:)
Vishal 写道:
How about keeping two pointers - startp and endp.
Keep a count of frequencies of keywords
Hash table should give you O(1) insertion and search complexity; which is
what we need, right?
There is no constraint on space complexity, I believe.
On 9/24/07, daizisheng [EMAIL PROTECTED] wrote:
the problem is you need a hash table to maintain all the keywords,:)
because they do not have
Vishal 写道:
Hash table should give you O(1) insertion and search complexity; which
is what we need, right?
There is no constraint on space complexity, I believe.
On 9/24/07, * daizisheng* [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
the problem is you need a hash table to
Another possibility is to first pre-process the keywords into
automaton-like structure (Google for Aho Corasick string matcher), and
then use it over the document. This would probably be helpful only if
the number of keywords is much smaller than the document itself..
On 9/25/07, daizisheng
18 matches
Mail list logo