Hi Karl,

The patch provided is not working since the error is thrown from 
org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification

return new 
DocumentumObjectImpl(objIDfSession,objIDfSession.getObjectByQualification(dql));

Error log as follows:

DfException:: THREAD: RMI TCP Connection(1083)-127.0.0.1; MSG: 
[DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:  "Error loading object: invalid 
string length 0 found in input stream"; ERRORCODE: 100; NEXT: null
        at 
com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at 
com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at 
com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
        at 
com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
        at 
com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
        at 
com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1370)
        at 
com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129)
        at 
com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1362)
        at 
com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
        at 
com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
        at 
com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
        at 
com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
        at 
com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
        at 
com.documentum.fc.client.impl.session.Session.getObject(Session.java:958)
        at 
com.documentum.fc.client.impl.session.Session.getObjectByQualificationEx(Session.java:1139)
        at 
com.documentum.fc.client.impl.session.Session.getObjectByQualification(Session.java:1117)
        at 
com.documentum.fc.client.impl.session.SessionHandle.getObjectByQualification(SessionHandle.java:755)
        at 
org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification(DocumentumImpl.java:334)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Regards,
Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:daddy...@gmail.com]
Sent: Friday, July 14, 2017 4:32 PM
To: user@manifoldcf.apache.org
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

I have created a ticket (CONNECTORS-1444) to track this issue, and attached a 
fix.  I've also committed the fix to trunk.

The fix is not the code change you have done, but instead introduces a new kind 
of DocumentumException: CORRUPTEDDOCUMENT.  This will be thrown whenever 
permanent document corruption is detected, and will cause the document to be 
skipped and not indexed.

The "DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED " error should cause the 
connector to retry the document at a later time, so if indeed this is not a 
permanent error, no special fix should be required.

Please let me know if the fix I have committed works for you.

Karl



On Fri, Jul 14, 2017 at 5:41 AM, Tamizh Kumaran Thamizharasan 
<tthamizhara...@worldbankgroup.org<mailto:tthamizhara...@worldbankgroup.org>> 
wrote:
Hi Karl,

Sorry for not explaining the issue in a detail manner.

(1)   Is it likely to go away or not on a retry;

The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and 
DM_OBJECT_E_LOAD_INVALID_STRING_LEN error are not likely to go away on 
immediate retry.

(2)   Does it substantially impact the ability of ManifoldCF to properly 
process the document;

The impact is someone need to monitor the indexing and if it gets stopped on 
these issues, need to use the restart-minimal to start the indexing again.
(3) Is it generally acceptable to skip ALL documents where the error occurs.
Yes, those errors are occurred for a large number of documents and its tough 
time for the user to restart the indexing again. Total documents count - 700000+
DM_OBJECT_E_LOAD_INVALID_STRING_LEN  - 11147
DM_PLATFORM_E_INTEGER_CONVERSION_ERROR  21708
Im not sure whether the occurrences of these issues are common on the 
documentum / due to improper documentum configuration/maintenance. We have 
encountered those errors on a couple of the documentum instances of lower 
environments (Not validated on production).

The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and 
DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused from the 
getObjectByQualification  method in the 
org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.

We made a fix to print the error on the log(documentum server process) and 
return null.
    catch (DfException e)
    {

      e.printStackTrace();
      return null;
      //throw new DocumentumException("Documentum error: "+e.getMessage());
    }


On the run() method of the  ProcessDocumentThread inner class on  the 
org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file,  if did a null check 
to continue with the document processing.
try
      {
IDocumentumObject object = session.getObjectByQualification("dm_document where 
i_chronicle_id='" + documentIdentifier +
          "' and any r_version_label='CURRENT'");
        if(object!=null) {
…
}
      }
      catch (Throwable e)
      {
        this.exception = e;
      }

The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely due to 
the document uploaded is parked in interim BOCS and moved to Repository after a 
shorter time.
If indexing happens on the gap, the properties will be accessible, but the 
document content will not be available that causes the error. The fix is not 
yet completed.
The code snippet that causes this error is shared below.
The run() method of the  ProcessDocumentThread inner class on  the 
org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
   try
          {
            strFilePath = object.getFile(objFileTemp.getCanonicalPath());
          }
          catch (DocumentumException dfe)
          {
            // Fetch failed, so log it
            activityStatus = "NOCONTENT";
            activityMessage = dfe.getMessage();
            if (dfe.getType() != DocumentumException.TYPE_NOTALLOWED)
              throw dfe;
            return;
          }

The getFile method on the 
org.apache.manifoldcf.crawler.common.DCTM.DocumentumObjectImpl

    catch (DfException dfe)
    {
      // Can't decide what to do without looking at the exception text.
      // This is crappy but it's the best we can manage, apparently.
      String errorMessage = dfe.getMessage();
      if (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1)
        // Treat it as transient, and retry
        throw new 
DocumentumException(dfe.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
      // It's probably not a transient error.  Report it as an access 
violation, even though it
      // may well not be.  We don't have much info as to what's happening.
      throw new 
DocumentumException(dfe.getMessage(),DocumentumException.TYPE_NOTALLOWED);
    }

The approach to discard uncrawlable documents and continue with the  indexing 
process is meaningful rather than stalling it. If you feel it is good to 
include, kindly do the required coding exception.

Regards,
Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:daddy...@gmail.com<mailto:daddy...@gmail.com>]
Sent: Friday, July 14, 2017 12:36 PM
To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

Hi Tamizh,

For any repository  errors, ManifoldCF needs to know the following:
(1) Is it likely to go away or not on a retry;
(2) Does it substantially impact the ability of ManifoldCF to properly process 
the document;
(3) Is it generally acceptable to skip ALL documents where the error occurs.

In this case your underlying error seems quite worrying:

[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily 
parked on a BOCS server host. It will be available when it is moved to a 
permanent storage area."

I could imagine that many or most documents are in fact in that state, in which 
case nothing can really be crawled?

I'm happy to make coding exceptions in the Documentum connector for discarding 
uncrawlable documents, but only if it makes sense to do that.  Here it is not 
clear at all that we'd want to change MCF to throw away all documents with this 
problem.  It sounds instead like there's some significant Documentum 
configuration issue to me.

Thanks,
Karl


On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan 
<tthamizhara...@worldbankgroup.org<mailto:tthamizhara...@worldbankgroup.org>> 
wrote:
Hi Team,

Below behavior is observed on using ManifoldCF Documentum connector.


•         On any Documentum specific error, the application throws the error 
and the job stops abruptly. If there is any specific reason for this approach?

Can we handle these errors by logging the errors, ignoring the document and 
continue the indexing?


Please find the sample error causing the job to fail.


Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error:  "The server 
was unable to convert the following string (String Unavailable) to an integer 
or long."

Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException: 
Documentum error: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:  "Error loading 
object: invalid string length 0 found in input stream"

Error: Repeated service interruptions - failure processing document: 
[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily 
parked on a BOCS server host. It will be available when it is moved to a 
permanent storage area."


Kindly provide your suggestion on this.

Regards,
Tamizh Kumaran Thamizharasan



Reply via email to