Hi Karl,

Thanks a lot for your replay. I didn't change any code in the framework except 
my own repository connector.

I found that there five methods which are available to inject document 
identifiers. Could you please tell me how I should choose the right way to 
inject the document identifiers.
 activities.addDocumentReference(documentIdentifier);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, 
relationshipType);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, 
relationshipType, dataNames, dataValues);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, 
relationshipType, dataNames, dataValues, originationTime);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, 
relationshipType, dataNames, dataValues, originationTime, prereqEventNames);

The way I injected document identifiers is as follows.

activities.addDocumentReference(docUri,documentIdentifier,RELATIONSHIP_CHILD);
docUri is the doc url which is supposed to be fetched, e.g. 
http://domino_server:80/path/dep1/database_name.nsf/api/data/documents
documentIdentifier is the parent url, e.g. 
http://domino_server:80/path/dep1/database_name.nsf/api/data/documents/unid/B0F9484E94DEA3204825813E001034E1

I am afraid that there is no full stack trace thrown. I have only got the

new IllegalArgumentException("Unrecognized document identifier: 
'"+documentIdentifier+"'");

with the following code in the 
WorkerThread.java(org.apache.manifoldcf.crawler.system). I've found the 
document identifier in the table of "jobqueue" and the dochash in the table of 
"jobqueue" is matched against the hashcode generated by the hash method.

For some of the document identifiers, 
previousDocuments.get(documentIdentifierHash) can return the queued document, 
but for several document identifier,
previousDocuments.get(documentIdentifierHash) return null.

Could you please give me some indication?

protected IPipelineSpecificationWithVersions 
computePipelineSpecificationWithVersions(String documentIdentifierHash,
      String componentIdentifierHash,
      String documentIdentifier)
    {
      QueuedDocument qd = previousDocuments.get(documentIdentifierHash);  // 
return null. The problem is here.
      if (qd == null)
        throw new IllegalArgumentException("Unrecognized document identifier: 
'"+documentIdentifier+"'");
      return new 
PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
    }

Best wishes,

Cheng




________________________________
From: Karl Wright <daddy...@gmail.com>
Sent: 12 November 2018 18:46
To: user@manifoldcf.apache.org
Subject: Re: Job stuck - WorkerThread functions return null

Hi,
Have you been modifying the framework code?  If so, I really cannot help you.

If you haven't -- it looks like you've got code that is injecting document 
identifiers that are incorrect.  But I will need to see a full stack trace to 
be sure of that.

Thanks,
Karl


On Mon, Nov 12, 2018 at 4:06 AM Cheng Zeng 
<ze...@hotmail.co.uk<mailto:ze...@hotmail.co.uk>> wrote:
Hi Karl,

I am developing my own repository where I borrowed some code from the file 
repository connector. I use my repository connector to crawling documents from 
IBM domino system. I managed to retrieve all the files in the domino, however, 
when I restart my job to recrawl the database in the domino, I've got problems 
with the following code where previousDocuments.get(documentIdentifierHash) in 
the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null for 
some of the document ids. As a result, the job got stuck with the specific 
document id.

Could you please tell me how I could fix the problem?

 protected IPipelineSpecificationWithVersions 
computePipelineSpecificationWithVersions(String documentIdentifierHash,
      String componentIdentifierHash,
      String documentIdentifier)
    {
      QueuedDocument qd = previousDocuments.get(documentIdentifierHash);  // 
return null. The problem is here.
      if (qd == null)
        throw new IllegalArgumentException("Unrecognized document identifier: 
'"+documentIdentifier+"'");
      return new 
PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
    }


Thanks a lot.

Cheng

Reply via email to