Hi Karl, The patch provided is not working since the error is thrown from org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification
return new DocumentumObjectImpl(objIDfSession,objIDfSession.getObjectByQualification(dql)); Error log as follows: DfException:: THREAD: RMI TCP Connection(1083)-127.0.0.1; MSG: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error: "Error loading object: invalid string length 0 found in input stream"; ERRORCODE: 100; NEXT: null at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57) at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39) at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137) at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310) at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653) at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1370) at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129) at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1362) at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107) at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191) at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82) at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355) at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311) at com.documentum.fc.client.impl.session.Session.getObject(Session.java:958) at com.documentum.fc.client.impl.session.Session.getObjectByQualificationEx(Session.java:1139) at com.documentum.fc.client.impl.session.Session.getObjectByQualification(Session.java:1117) at com.documentum.fc.client.impl.session.SessionHandle.getObjectByQualification(SessionHandle.java:755) at org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification(DocumentumImpl.java:334) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Regards, Tamizh Kumaran Thamizharasan From: Karl Wright [mailto:daddy...@gmail.com] Sent: Friday, July 14, 2017 4:32 PM To: user@manifoldcf.apache.org Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani Subject: Re: Documentum job stops on error I have created a ticket (CONNECTORS-1444) to track this issue, and attached a fix. I've also committed the fix to trunk. The fix is not the code change you have done, but instead introduces a new kind of DocumentumException: CORRUPTEDDOCUMENT. This will be thrown whenever permanent document corruption is detected, and will cause the document to be skipped and not indexed. The "DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED " error should cause the connector to retry the document at a later time, so if indeed this is not a permanent error, no special fix should be required. Please let me know if the fix I have committed works for you. Karl On Fri, Jul 14, 2017 at 5:41 AM, Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org<mailto:tthamizhara...@worldbankgroup.org>> wrote: Hi Karl, Sorry for not explaining the issue in a detail manner. (1) Is it likely to go away or not on a retry; The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN error are not likely to go away on immediate retry. (2) Does it substantially impact the ability of ManifoldCF to properly process the document; The impact is someone need to monitor the indexing and if it gets stopped on these issues, need to use the restart-minimal to start the indexing again. (3) Is it generally acceptable to skip ALL documents where the error occurs. Yes, those errors are occurred for a large number of documents and its tough time for the user to restart the indexing again. Total documents count - 700000+ DM_OBJECT_E_LOAD_INVALID_STRING_LEN - 11147 DM_PLATFORM_E_INTEGER_CONVERSION_ERROR 21708 Im not sure whether the occurrences of these issues are common on the documentum / due to improper documentum configuration/maintenance. We have encountered those errors on a couple of the documentum instances of lower environments (Not validated on production). The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused from the getObjectByQualification method in the org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl. We made a fix to print the error on the log(documentum server process) and return null. catch (DfException e) { e.printStackTrace(); return null; //throw new DocumentumException("Documentum error: "+e.getMessage()); } On the run() method of the ProcessDocumentThread inner class on the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file, if did a null check to continue with the document processing. try { IDocumentumObject object = session.getObjectByQualification("dm_document where i_chronicle_id='" + documentIdentifier + "' and any r_version_label='CURRENT'"); if(object!=null) { … } } catch (Throwable e) { this.exception = e; } The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely due to the document uploaded is parked in interim BOCS and moved to Repository after a shorter time. If indexing happens on the gap, the properties will be accessible, but the document content will not be available that causes the error. The fix is not yet completed. The code snippet that causes this error is shared below. The run() method of the ProcessDocumentThread inner class on the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM try { strFilePath = object.getFile(objFileTemp.getCanonicalPath()); } catch (DocumentumException dfe) { // Fetch failed, so log it activityStatus = "NOCONTENT"; activityMessage = dfe.getMessage(); if (dfe.getType() != DocumentumException.TYPE_NOTALLOWED) throw dfe; return; } The getFile method on the org.apache.manifoldcf.crawler.common.DCTM.DocumentumObjectImpl catch (DfException dfe) { // Can't decide what to do without looking at the exception text. // This is crappy but it's the best we can manage, apparently. String errorMessage = dfe.getMessage(); if (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1) // Treat it as transient, and retry throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION); // It's probably not a transient error. Report it as an access violation, even though it // may well not be. We don't have much info as to what's happening. throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_NOTALLOWED); } The approach to discard uncrawlable documents and continue with the indexing process is meaningful rather than stalling it. If you feel it is good to include, kindly do the required coding exception. Regards, Tamizh Kumaran Thamizharasan From: Karl Wright [mailto:daddy...@gmail.com<mailto:daddy...@gmail.com>] Sent: Friday, July 14, 2017 12:36 PM To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org> Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani Subject: Re: Documentum job stops on error Hi Tamizh, For any repository errors, ManifoldCF needs to know the following: (1) Is it likely to go away or not on a retry; (2) Does it substantially impact the ability of ManifoldCF to properly process the document; (3) Is it generally acceptable to skip ALL documents where the error occurs. In this case your underlying error seems quite worrying: [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily parked on a BOCS server host. It will be available when it is moved to a permanent storage area." I could imagine that many or most documents are in fact in that state, in which case nothing can really be crawled? I'm happy to make coding exceptions in the Documentum connector for discarding uncrawlable documents, but only if it makes sense to do that. Here it is not clear at all that we'd want to change MCF to throw away all documents with this problem. It sounds instead like there's some significant Documentum configuration issue to me. Thanks, Karl On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org<mailto:tthamizhara...@worldbankgroup.org>> wrote: Hi Team, Below behavior is observed on using ManifoldCF Documentum connector. • On any Documentum specific error, the application throws the error and the job stops abruptly. If there is any specific reason for this approach? Can we handle these errors by logging the errors, ignoring the document and continue the indexing? Please find the sample error causing the job to fail. Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error: "The server was unable to convert the following string (String Unavailable) to an integer or long." Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException: Documentum error: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error: "Error loading object: invalid string length 0 found in input stream" Error: Repeated service interruptions - failure processing document: [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily parked on a BOCS server host. It will be available when it is moved to a permanent storage area." Kindly provide your suggestion on this. Regards, Tamizh Kumaran Thamizharasan