On 23/11/2016 16:31, Gupta, Rajiv wrote:
Thanks for your reply Nick.

I wanted to delete the old documents that is why I was trying to get the doc_id 
and use that to delete it. However, that does not help it deleted other 
documents and keep changing the document. I wanted to use delete by term but in 
my doc I don't have any primary key.

I add document like this:

$indexer->add_doc({
                title    => $mytitle,
                content  => substr($mybodytext,0,1024),
                url      => $onlyfilename,
                urlpath  => $filpath,
                position => $fileseektostart,
                linenum  => $filelinenumtostart,
                jobtype  => $self->{_logfile_hash}{$filetoindex}[5] ,
            });

You can use any field as primary key if the field's value is guaranteed to be unique for all your documents. But it seems that you index the contents of files line by line, so "urlpath" isn't unique. Your primary key is probably the tuple (urlpath, linenum).

If you update all the lines of a file at once, this isn't a problem. You can simply delete all documents relating to the file with

    $indexer->delete_by_term(
        field => 'urlpath',
        term  => $filepath,
    );

If you only want to update certain lines, you'll have to construct an ANDQuery for each line and use delete_by_query. For example:

    $indexer->delete_by_query(Lucy::Search::ANDQuery->new(
        children => [
            Lucy::Search::TermQuery->new(
                field => 'urlpath',
                term  => $filepath,
            ),
            Lucy::Search::TermQuery->new(
                field => 'linenum',
                term  => $linenum,
            ),
        ],
    ));

Or maybe use a RangeQuery to delete a contiguous range of lines.

Nick

Reply via email to