Hi Cheng, Unless you are using carrydown information (that is, information that is recorded for a parent document that the child document needs access to), this is the method you want to use:
activities.addDocumentReference(documentIdentifier); If you DO need to pull data recorded for a parent from the child, the best connector to look at for an example is the SharePoint connector. As far as the stack trace is concerned -- these always get written to the log. The reason the framework "hangs" is because the exception is a fatal one and is basically causes the thread to restart itself, and thus nothing progresses under those conditions. Very probably the cause of this exception is that you are including a 'parent identifier' which is not actually a document identifier that was itself added. Karl On Wed, Nov 14, 2018 at 2:16 AM Cheng Zeng <ze...@hotmail.co.uk> wrote: > > Hi Karl, > > Thanks a lot for your replay. I didn't change any code in the framework > except my own repository connector. > > I found that there five methods which are available to inject document > identifiers. Could you please tell me how I should choose the right way to > inject the document identifiers. > activities.addDocumentReference(documentIdentifier); > activities.addDocumentReference(documentIdentifier, parentIdentifier, > relationshipType); > activities.addDocumentReference(documentIdentifier, parentIdentifier, > relationshipType, dataNames, dataValues); > activities.addDocumentReference(documentIdentifier, parentIdentifier, > relationshipType, dataNames, dataValues, originationTime); > activities.addDocumentReference(documentIdentifier, parentIdentifier, > relationshipType, dataNames, dataValues, originationTime, prereqEventNames); > > The way I injected document identifiers is as follows. > > > activities.addDocumentReference(docUri,documentIdentifier,RELATIONSHIP_CHILD); > docUri is the doc url which is supposed to be fetched, e.g. > http://domino_server:80/path/dep1/database_name.nsf/api/data/documents > documentIdentifier is the parent url, e.g. > http://domino_server:80/path/dep1/database_name.nsf/api/data/documents/unid/B0F9484E94DEA3204825813E001034E1 > > I am afraid that there is no full stack trace thrown. I have only got the > > new IllegalArgumentException("Unrecognized document identifier: > '"+documentIdentifier+"'"); > > with the following code in the > WorkerThread.java(org.apache.manifoldcf.crawler.system). > I've found the document identifier in the table of "jobqueue" and the > dochash in the table of "jobqueue" is matched against the hashcode > generated by the hash method. > > For some of the document identifiers, > previousDocuments.get(documentIdentifierHash) can return the queued > document, but for several document identifier, > previousDocuments.get(documentIdentifierHash) return null. > > Could you please give me some indication? > > protected IPipelineSpecificationWithVersions > computePipelineSpecificationWithVersions(String documentIdentifierHash, > String componentIdentifierHash, > String documentIdentifier) > { > QueuedDocument qd = previousDocuments.get(documentIdentifierHash); > // return null. The problem is here. > if (qd == null) > throw new IllegalArgumentException("Unrecognized document > identifier: '"+documentIdentifier+"'"); > return new > PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash); > } > > Best wishes, > > Cheng > > > > > ------------------------------ > *From:* Karl Wright <daddy...@gmail.com> > *Sent:* 12 November 2018 18:46 > *To:* user@manifoldcf.apache.org > *Subject:* Re: Job stuck - WorkerThread functions return null > > Hi, > Have you been modifying the framework code? If so, I really cannot help > you. > > If you haven't -- it looks like you've got code that is injecting document > identifiers that are incorrect. But I will need to see a full stack trace > to be sure of that. > > Thanks, > Karl > > > On Mon, Nov 12, 2018 at 4:06 AM Cheng Zeng <ze...@hotmail.co.uk> wrote: > > Hi Karl, > > I am developing my own repository where I borrowed some code from the file > repository connector. I use my repository connector to crawling documents > from IBM domino system. I managed to retrieve all the files in the domino, > however, when I restart my job to recrawl the database in the domino, I've > got problems with the following code where > previousDocuments.get(documentIdentifierHash) > in the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null > for some of the document ids. As a result, the job got stuck with the > specific document id. > > Could you please tell me how I could fix the problem? > > protected IPipelineSpecificationWithVersions > computePipelineSpecificationWithVersions(String documentIdentifierHash, > String componentIdentifierHash, > String documentIdentifier) > { > QueuedDocument qd = previousDocuments.get(documentIdentifierHash); > // return null. The problem is here. > if (qd == null) > throw new IllegalArgumentException("Unrecognized document > identifier: '"+documentIdentifier+"'"); > return new > PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash); > } > > > Thanks a lot. > > Cheng > >