Hi there,
I was wondering is it possible to get doc_id during the indexing
process, or can I simply assume that doc_id starts from 0 and increments
with each record added?
Basically, I need SQL like:
INSERT INTO tbl (name) VALUES ('John') RETURNING id
after each INSERT I can extend the list of document id's in which name
John appears.
For example, I want to make a hash which maps some people names to a
list of internal doc_id:
my %keyword_to_doc_id;
while (...) {
my $content = ...get a document;
my $keyword = .. get a person's name;
$indexer->add_doc( { doc_content => $content, ... } );
push ( @{$keyword_to_doc_id{$keyword}}, <doc_id> ) if ($keyword is in the
$content)
}|
$indexer->commit;
...
make another index of keywords appearing in the indexed documents without
time consuming search of previously created index for|||millions of predefined
keywords|
|
For text mining purposes, I can later analyze only index of predefined
keywords (metadata), and extend the search to much bigger documents
index only when needed.
Alex