Re: Job stuck - WorkerThread functions return null

Karl Wright Wed, 14 Nov 2018 07:48:02 -0800

Hi Cheng,

Unless you are using carrydown information (that is, information that is
recorded for a parent document that the child document needs access to),
this is the method you want to use:


activities.addDocumentReference(documentIdentifier);

If you DO need to pull data recorded for a parent from the child, the best
connector to look at for an example is the SharePoint connector.

As far as the stack trace is concerned -- these always get written to the
log.  The reason the framework "hangs" is because the exception is a fatal
one and is basically causes the thread to restart itself, and thus nothing
progresses under those conditions.  Very probably the cause of this
exception is that you are including a 'parent identifier' which is not
actually a document identifier that was itself added.

Karl


On Wed, Nov 14, 2018 at 2:16 AM Cheng Zeng <ze...@hotmail.co.uk> wrote:

>
> Hi Karl,
>
> Thanks a lot for your replay. I didn't change any code in the framework
> except my own repository connector.
>
> I found that there five methods which are available to inject document
> identifiers. Could you please tell me how I should choose the right way to
> inject the document identifiers.
>  activities.addDocumentReference(documentIdentifier);
>  activities.addDocumentReference(documentIdentifier, parentIdentifier,
> relationshipType);
>  activities.addDocumentReference(documentIdentifier, parentIdentifier,
> relationshipType, dataNames, dataValues);
>  activities.addDocumentReference(documentIdentifier, parentIdentifier,
> relationshipType, dataNames, dataValues, originationTime);
>  activities.addDocumentReference(documentIdentifier, parentIdentifier,
> relationshipType, dataNames, dataValues, originationTime, prereqEventNames);
>
> The way I injected document identifiers is as follows.
>
>
> activities.addDocumentReference(docUri,documentIdentifier,RELATIONSHIP_CHILD);
> docUri is the doc url which is supposed to be fetched, e.g.
> http://domino_server:80/path/dep1/database_name.nsf/api/data/documents
> documentIdentifier is the parent url, e.g.
> http://domino_server:80/path/dep1/database_name.nsf/api/data/documents/unid/B0F9484E94DEA3204825813E001034E1
>
> I am afraid that there is no full stack trace thrown. I have only got the
>
> new IllegalArgumentException("Unrecognized document identifier:
> '"+documentIdentifier+"'");
>
> with the following code in the 
> WorkerThread.java(org.apache.manifoldcf.crawler.system).
> I've found the document identifier in the table of "jobqueue" and the
> dochash in the table of "jobqueue" is matched against the hashcode
> generated by the hash method.
>
> For some of the document identifiers,
> previousDocuments.get(documentIdentifierHash) can return the queued
> document, but for several document identifier,
> previousDocuments.get(documentIdentifierHash) return null.
>
> Could you please give me some indication?
>
> protected IPipelineSpecificationWithVersions
> computePipelineSpecificationWithVersions(String documentIdentifierHash,
>       String componentIdentifierHash,
>       String documentIdentifier)
>     {
>       QueuedDocument qd = previousDocuments.get(documentIdentifierHash);
>  // return null. The problem is here.
>       if (qd == null)
>         throw new IllegalArgumentException("Unrecognized document
> identifier: '"+documentIdentifier+"'");
>       return new
> PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
>     }
>
> Best wishes,
>
> Cheng
>
>
>
>
> ------------------------------
> *From:* Karl Wright <daddy...@gmail.com>
> *Sent:* 12 November 2018 18:46
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Job stuck - WorkerThread functions return null
>
> Hi,
> Have you been modifying the framework code?  If so, I really cannot help
> you.
>
> If you haven't -- it looks like you've got code that is injecting document
> identifiers that are incorrect.  But I will need to see a full stack trace
> to be sure of that.
>
> Thanks,
> Karl
>
>
> On Mon, Nov 12, 2018 at 4:06 AM Cheng Zeng <ze...@hotmail.co.uk> wrote:
>
> Hi Karl,
>
> I am developing my own repository where I borrowed some code from the file
> repository connector. I use my repository connector to crawling documents
> from IBM domino system. I managed to retrieve all the files in the domino,
> however, when I restart my job to recrawl the database in the domino, I've
> got problems with the following code where 
> previousDocuments.get(documentIdentifierHash)
> in the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null
> for some of the document ids. As a result, the job got stuck with the
> specific document id.
>
> Could you please tell me how I could fix the problem?
>
>  protected IPipelineSpecificationWithVersions
> computePipelineSpecificationWithVersions(String documentIdentifierHash,
>       String componentIdentifierHash,
>       String documentIdentifier)
>     {
>       QueuedDocument qd = previousDocuments.get(documentIdentifierHash);
>  // return null. The problem is here.
>       if (qd == null)
>         throw new IllegalArgumentException("Unrecognized document
> identifier: '"+documentIdentifier+"'");
>       return new
> PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
>     }
>
>
> Thanks a lot.
>
> Cheng
>
>

Re: Job stuck - WorkerThread functions return null

Reply via email to