[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update "last checked" time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795722#comment-16795722 ] Karl Wright commented on CONNECTORS-880: [~SubasiniR], your issue has nothing whatsoever to do with this ticket. It really belongs first on the user list. The issue is that your database is going offline for 2700 seconds while your crawl is taking place, or almost 45 minutes. Queries that normally would be instantaneous are therefore just not being completed at all for that period of time. The plans look fine so that isn't it. If this is using HSQLDB (which is the default database for the single-process example), then you probably have exceeded its capacity. It stores all of its tables in memory. You will want to upgrade to a real database instead. I would preter postgresql over mysql because mysql has been having transactional integrity issues for a couple of versions now, and that will be fatal to use with ManifoldCF. By the way, "Illegal seed URL" is a warning and does not impact behavior other than to notify you that one of the seeds you are using in your crawl is not valid according to the w3c spec. The seed will not be used. > Under the right conditions, job aborts do not update "last checked" time > > > Key: CONNECTORS-880 > URL: https://issues.apache.org/jira/browse/CONNECTORS-880 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Affects Versions: ManifoldCF 1.4.1 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 1.6 > > > When a scheduled job is being considered to be started, MCF updates the > last-check field ONLY if the job didn't start. It relies on the job's > completion to set the last-check field in the case where the job does start. > But if the job aborts, in at least one case the last-check field is NOT > updated. This leads to the job being run over and over again within the > schedule window. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update "last checked" time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795657#comment-16795657 ] Subasini Rath commented on CONNECTORS-880: -- Hi Karl, I am also facing the above mentioned issue. (Similar to Connector-880) I am using manifold2.12 binary version. I am using Solr output connector and Web repository connection. Manifold is using all default configuration. When I am running the jobs manually, it runs fine. Same jobs have been scheduled to run everyday. I am getting below exceptions and the job gets hanged/ going to waiting stage. Could you please help me in resolving the same. I am getting the below error - Scenario-1 WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query (2706114 ms): [SELECT t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs t0 ORDER BY description ASC] WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1] WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query (2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)] WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 'e' WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running query (2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE needpriority=? LIMIT 1000] WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T' WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I' WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i' WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062' WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1] WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 0: 'S' WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query (2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE] WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A' WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W' WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R' WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE] WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E' WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running query (2496641 ms): [SELECT t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs t0 ORDER BY description ASC] WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isDistinctSelect=[false] WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false] WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false] WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: PUBLIC.JOBS.ID not nullable WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running query (2435908 ms): [SELECT t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs t0 ORDER BY description ASC] WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ] WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1 WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0 WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = [index=SYS_IDX_13329 WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ] WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2 WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3 WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX PRED WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = [index=I1549955498033 WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: start conditions=[ WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: EQUAL arg_left=[ COLUMN: PUBLIC.JOBS.STATUS WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ] arg_right=[ COLUMN: C1 WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]] WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: end condition=[ WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: EQUAL arg_left=[ COLUMN: PUBLIC.JOBS.STATUS WARN 2019-03-08T23:58:20,501 (Finisher thread) - Plan: ] arg_right=[ COLUMN: C1 WARN
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899260#comment-13899260 ] Florian Schmedding commented on CONNECTORS-880: --- I believe that in my case the job repetition was depending on the wrong collation. When running a job with a case-insensitive collation in MySQL it get started again without a previous job end. The same job runs as expected with a correctly configured database. However, I think your fix does not intend to remedy completely inconsistent status values resulting from the wrong collation. So my setup inn't a test case for it. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897667#comment-13897667 ] Florian Schmedding commented on CONNECTORS-880: --- There are some error in the manifold log: DEBUG 2014-02-11 10:11:11,989 (Thread-20602) - Actual query: [SELECT status,connectionname,outputname FROM jobs WHERE id=? FOR UPDATE] DEBUG 2014-02-11 10:11:11,989 (Thread-20602) - Parameter 0: '1392051994515' DEBUG 2014-02-11 10:11:11,989 (Thread-20602) - Done actual query (0ms): [SELECT status,connectionname,outputname FROM jobs WHERE id=? FOR UPDATE] DEBUG 2014-02-11 10:11:11,989 (Job reset thread) - Ending transaction DEBUG 2014-02-11 10:11:11,989 (Job reset thread) - Rolling transaction back! DEBUG 2014-02-11 10:11:11,992 (Thread-20603) - Actual query: [ROLLBACK] DEBUG 2014-02-11 10:11:11,992 (Thread-20603) - Done actual query (0ms): [ROLLBACK] ERROR 2014-02-11 10:11:11,992 (Job reset thread) - Exception tossed: Unexpected job status encountered: 33 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job status encountered: 33 at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1726) at org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7427) at org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91) There is a similar exception with Unexpected job status encountered: 34. When looking into the database, the status field of all jobs is constantly changing between 's' and 'n'. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897672#comment-13897672 ] Florian Schmedding commented on CONNECTORS-880: --- I'm using a Solr output connection. Manually sending a document to its update handler does not raise any problems, however, Manifold seems to receive only service interruptions. No document gets indexed. WARN 2014-02-11 10:17:36,592 (Job notification thread) - IO exception during commit: The target server failed to respond org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1219) WARN 2014-02-11 10:17:36,592 (Job notification thread) - Service interruption notifying connection - retrying: IO exception during commit: The target server failed to respond org.apache.manifoldcf.agents.interfaces.ServiceInterruption: IO exception during commit: The target server failed to respond at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:477) at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:357) at org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:304) at org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:744) at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:121) Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897703#comment-13897703 ] Florian Schmedding commented on CONNECTORS-880: --- A job with a null output connection works fine (same repository). Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897717#comment-13897717 ] Karl Wright commented on CONNECTORS-880: Hi Florian, What version of MCF is this? It does not appear to be trunk (line numbers don't match). Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897725#comment-13897725 ] Karl Wright commented on CONNECTORS-880: Hi Florian, In order to make progress here, we need to get in synch. First, if it is possible for you to use trunk, that would help enormously; that is where this weekend's work was committed. If you are, please check out and build: svn co https://svn.apache.org/repos/asf/manifoldcf/trunk cd trunk ant make-core-deps make-deps ant build If you already are using trunk, please synch up, because you are out of synch: cd trunk svn update ant build Second, the unexpected value errors make no sense to me at all for the moment. The only thing I can think of is that perhaps you have more than one agents process running, and this case has somehow not been handled properly in that context. Can you confirm this? If that's not what you think you are doing, can you confirm that multiple running MCF instances are not inadvertantly pointing to the same database instance? Finally, the Solr connection refusals represent HTTP socket connections that fail. That argues that your Solr connection parameters are wrong. When you view the connection in the UI, what do you see? Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897868#comment-13897868 ] Florian Schmedding commented on CONNECTORS-880: --- Unfortunately, there are still the same errors with the new revision. I'm using the manifoldcf-combined-service.war under Tomcat 7 but I don't know if there are multiple agents processes running (according to the build doc I think there should be only one) nor how to check that. About the Solr connection: Connection status: Connection working (View Output Connection Status) I cannot find any wrong parameter, Solr admin is working fine. The ping request from manifold is visible in the access log: 127.0.0.1 - - [11/Feb/2014:15:08:52 +0100] GET /solr/default/admin/ping?wt=xmlversion=2.2 HTTP/1.1 200 1329 Other manually executed requests work as well: 0:0:0:0:0:0:0:1 - - [11/Feb/2014:15:12:21 +0100] GET /solr/default/update?commit=true HTTP/1.1 200 160 However, no further requests from manifold are logged. Did the Solr connection handler change? I'm using Solr 4.3.1. * DEBUG 2014-02-11 14:54:17,774 (Job reset thread) - Job 1385456433981 now completed ERROR 2014-02-11 14:54:17,801 (Job reset thread) - Exception tossed: Unexpected job status encountered: 33 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job status encountered: 33 at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1901) at org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7726) at org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91) DEBUG 2014-02-11 14:54:17,857 (Job notification thread) - Found job 1385456433981 in need of notification DEBUG 2014-02-11 14:54:17,862 (Job notification thread) - Found job 1392051994515 in need of notification DEBUG 2014-02-11 14:54:17,867 (Job notification thread) - Found job 1392109738731 in need of notification DEBUG 2014-02-11 14:54:17,871 (Job notification thread) - Found job 1392112746052 in need of notification DEBUG 2014-02-11 14:54:17,891 (Job reset thread) - Job 1385456433981 now completed ERROR 2014-02-11 14:54:17,928 (Job reset thread) - Exception tossed: Unexpected job status encountered: 34 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job status encountered: 34 at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1901) at org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7726) at org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91) ** WARN 2014-02-11 15:02:28,180 (Job notification thread) - Notification service interruption reported for job 1392112746052 output connection 'solr localhost': IO exception during commit: The target server failed to respond org.apache.manifoldcf.agents.interfaces.ServiceInterruption: IO exception during commit: The target server failed to respond at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:477) at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:357) at org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:304) at org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:744) at org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:118) Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897876#comment-13897876 ] Karl Wright commented on CONNECTORS-880: Hi Florian, The version of SolrJ changed; it's now 4.6.0. Usually, though, SolrJ is pretty backwards compatible, so I'm surprised that you are having difficulty with it, but at the same time I can believe that they messed up over there. If you want to rule out that cause, I think you can just replace the solr-solrj.jar in the lib directory with the 4.3.1 equivalent, and rebuild. I'll look at the trace over the next hour or so and see if it tells me anything about the state issues. I did try this, FWIW, against localhost when Solr wasn't even running, and saw nothing like this in the log. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897904#comment-13897904 ] Karl Wright commented on CONNECTORS-880: It occurs to me that your issues may be due to MySQL transactional integrity. There were versions of MySQL that failed to properly serialize database activity; anything earlier than 5.5 for instance is definitely suspect. What version of MySQL are you using? Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898001#comment-13898001 ] Florian Schmedding commented on CONNECTORS-880: --- Replacing the solj-connector with an older version does not seem to solve the problem. During ingestion the job gets aborted. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898011#comment-13898011 ] Karl Wright commented on CONNECTORS-880: Hi Florian, There is no magic here. Something cannot be correctly configured. The Solr connector is used extensively, and there are mock tests for it as well. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898058#comment-13898058 ] Karl Wright commented on CONNECTORS-880: It also occurs to me that perhaps you have restricted Solr in some way so that it does not receive POST requests. I bet your testing uses GET requests -- but MCF uses multi-part posts for everything. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898194#comment-13898194 ] Karl Wright commented on CONNECTORS-880: Hi Florian, I just realized that you said: bq. Replacing the solj-connector with an older version does not seem to solve the problem. During ingestion the job gets aborted. But, that's not what I asked you to try. I asked you to replace lib/solr-solrj.jar with solr-solrj.jar version 4.3.1, and then: ant clean ant build Replacing the connector jar will have no effect other than to possibly break things further. Thanks! Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898413#comment-13898413 ] Florian Schmedding commented on CONNECTORS-880: --- I just replaced httpclient library (without re-building) and now the HTTP POST works fine. I noticed that all connections except those from Manifold appeared with the address 0:0:0:0:0:0:0:1 instead of 127.0.0.1. There is a resolved issue about IPv6 addresses: https://issues.apache.org/jira/browse/HTTPCLIENT-1317. I don't know if this was really the cause of this trouble, but anyway, the new version works. mcf-combined-service.war: httpclient.jar - httpclient-4.3.2.jar httpcore.jar - httpcore-4.3.1.jar connector-lib: httpmime.jar - httpmime-4.3.2.jar (perhaps not important) (binary from http://ftp.fau.de/apache//httpcomponents/httpclient/binary/httpcomponents-client-4.3.2-bin.zip at http://hc.apache.org/downloads.cgi) After restarting Tomcat all documents get indexed by Solr. Now I configured a schedule to check the fix you provided. Thanks for your help! Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898419#comment-13898419 ] Karl Wright commented on CONNECTORS-880: Hi Florian, We include the current 4.2.x version of httpclient (4.2.6) because that's what SolrJ uses. It looks like they did not backport the fix to the 4.2.x stream, so we'll have to wait until SolrJ goes to 4.3.x. I'm told that won't happen until Solr 4.7. Given that, I'm glad that you found a working solution. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896778#comment-13896778 ] Florian Schmedding commented on CONNECTORS-880: --- I installed mcf-combined-service.war version 1.6 on a Tomcat server. However, the test job is not working correctly. I've used an existing database from Manifold 1.3 with one existing output connection, one existing repository connection and one existing job. On the job status page there were no buttons to control the job, its status is End notification. Therefore I copied the job. Then there were buttons to start the new job, but it got only errors from the output connection when ingesting documents. After aborting the job, it got stuck with the status End notification and does not leave it. Should I better create new connections and jobs or are there other problems in version 1.6? Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896788#comment-13896788 ] Karl Wright commented on CONNECTORS-880: Hi Florian, There are no other known problems in 1.6, but I have been working on the job state transitions. Specifically, the various kinds of ServiceInterruption from output connectors are now honored. So if you are using an output connector that is responding to a request with a ServiceInterruption, then unless that connector is sending the right variety of ServiceInterruption it is conceivable that it could retry indefinitely, or at least for much longer than you would like. Could you be more specific about which output connector you are using? Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892232#comment-13892232 ] Karl Wright commented on CONNECTORS-880: Merged the branch to trunk, after it passed all tests: r1564804. Not clear if it definitively works; it's pretty hard to repro without building a test output connector, and didn't have time for that yet. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891073#comment-13891073 ] Karl Wright commented on CONNECTORS-880: The problem is happening during the cleanup phase of the crawl, most likely because of an error from the output connector. Normally: JobResetThread calls JobManager.resetJobs() which normally calls Jobs.finishJob(time). Error condition: DocumentCleanupThread calls JobManager.errorAbort(), which is clearly the wrong method to use, since it just throws an exception for any jobs in the STATUS_SHUTTINGDOWN state: throw new ManifoldCFException(Job +jobID+ is not active); So, the SHUTTINGDOWN state needs its own abort method, e.g. cleanupAbort(), and possibly its own state (ABORTING_CLEANINGUP). Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time
[ https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891261#comment-13891261 ] Karl Wright commented on CONNECTORS-880: Created a branch, CONNECTORS-880, with the proposed fix. Hoping that it works in the field. Passes tests here -- although I don't have a good test that tosses exceptions on output connection document deletion. May need to come up with a test output connector that I can use to exercise this condition. Under the right conditions, job aborts do not update last checked time Key: CONNECTORS-880 URL: https://issues.apache.org/jira/browse/CONNECTORS-880 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 1.4.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.6 When a scheduled job is being considered to be started, MCF updates the last-check field ONLY if the job didn't start. It relies on the job's completion to set the last-check field in the case where the job does start. But if the job aborts, in at least one case the last-check field is NOT updated. This leads to the job being run over and over again within the schedule window. -- This message was sent by Atlassian JIRA (v6.1.5#6160)