[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update "last checked" time

2019-03-19 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795722#comment-16795722
 ] 

Karl Wright commented on CONNECTORS-880:


[~SubasiniR], your issue has nothing whatsoever to do with this ticket.  It 
really belongs first on the user list.

The issue is that your database is going offline for 2700 seconds while your 
crawl is taking place, or almost 45 minutes.  Queries that normally would be 
instantaneous are therefore just not being completed at all for that period of 
time.  The plans look fine so that isn't it.

If this is using HSQLDB (which is the default database for the single-process 
example), then you probably have exceeded its capacity.  It stores all of its 
tables in memory.  You will want to upgrade to a real database instead.  I 
would preter postgresql over mysql because mysql has been having transactional 
integrity issues for a couple of versions now, and that will be fatal to use 
with ManifoldCF.

By the way, "Illegal seed URL" is a warning and does not impact behavior other 
than to notify you that one of the seeds you are using in your crawl is not 
valid according to the w3c spec.  The seed will not be used.





> Under the right conditions, job aborts do not update "last checked" time
> 
>
> Key: CONNECTORS-880
> URL: https://issues.apache.org/jira/browse/CONNECTORS-880
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 1.4.1
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 1.6
>
>
> When a scheduled job is being considered to be started, MCF updates the 
> last-check field ONLY if the job didn't start.  It relies on the job's 
> completion to set the last-check field in the case where the job does start.  
> But if the job aborts, in at least one case the last-check field is NOT 
> updated.  This leads to the job being run over and over again within the 
> schedule window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update "last checked" time

2019-03-18 Thread Subasini Rath (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795657#comment-16795657
 ] 

Subasini Rath commented on CONNECTORS-880:
--

Hi Karl,

   I am also facing the above mentioned issue. (Similar to Connector-880)

I am using manifold2.12 binary version. I am using Solr output connector and 
Web repository connection. Manifold is using all default configuration.

When I am running the jobs manually, it runs fine. Same jobs have been 
scheduled to run everyday.

I am getting below exceptions and the job gets hanged/ going to waiting stage.

Could you please help me in resolving the same.

I am getting the below error -

Scenario-1

WARN 2019-03-08T23:58:20,338 (qtp550147359-413) - Found a long-running query 
(2706114 ms): [SELECT 
t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
t0 ORDER BY description ASC]
WARN 2019-03-08T23:58:20,337 (Document delete stuffer thread) - Found a 
long-running query (2737370 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
WARN 2019-03-08T23:58:20,339 (Job reset thread) - Found a long-running query 
(2770133 ms): [SELECT id FROM jobs WHERE status IN (?,?)]
WARN 2019-03-08T23:58:20,386 (Document delete stuffer thread) - Parameter 0: 'e'
WARN 2019-03-08T23:58:20,337 (Set priority thread) - Found a long-running query 
(2732379 ms): [SELECT id,dochash,docid,jobid FROM jobqueue WHERE needpriority=? 
LIMIT 1000]
WARN 2019-03-08T23:58:20,386 (Set priority thread) - Parameter 0: 'T'
WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 0: 'I'
WARN 2019-03-08T23:58:20,386 (Job reset thread) - Parameter 1: 'i'
WARN 2019-03-08T23:58:20,372 (Seeding thread) - Parameter 2: '1552047176062'
WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Found a 
long-running query (2737524 ms): [SELECT id FROM jobs WHERE status=? LIMIT 1]
WARN 2019-03-08T23:58:20,474 (Document cleanup stuffer thread) - Parameter 0: 
'S'
WARN 2019-03-08T23:58:20,474 (Finisher thread) - Found a long-running query 
(2752034 ms): [SELECT id FROM jobs WHERE status IN (?,?,?) FOR UPDATE]
WARN 2019-03-08T23:58:20,474 (Finisher thread) - Parameter 0: 'A'
WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 1: 'W'
WARN 2019-03-08T23:58:20,475 (Finisher thread) - Parameter 2: 'R'
WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Found a long-running 
query (2752036 ms): [SELECT id FROM jobs WHERE status=? FOR UPDATE]
WARN 2019-03-08T23:58:20,475 (Delete startup thread) - Parameter 0: 'E'
WARN 2019-03-08T23:58:20,483 (qtp550147359-4339) - Found a long-running query 
(2496641 ms): [SELECT 
t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
t0 ORDER BY description ASC]
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isDistinctSelect=[false]
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isGrouped=[false]
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: isAggregated=[false]
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: columns=[ COLUMN: 
PUBLIC.JOBS.ID not nullable
WARN 2019-03-08T23:58:20,492 (qtp550147359-4346) - Found a long-running query 
(2435908 ms): [SELECT 
t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs 
t0 ORDER BY description ASC]
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: 
WARN 2019-03-08T23:58:20,492 (Finisher thread) - Plan: ]
WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: [range variable 1
WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: join type=INNER
WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: table=SYSTEM_SUBQUERY
WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: cardinality=0
WARN 2019-03-08T23:58:20,499 (Finisher thread) - Plan: access=FULL SCAN
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
[index=SYS_IDX_13329
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ][range variable 2
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join type=INNER
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: table=JOBS
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: cardinality=3
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: access=INDEX PRED
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: join condition = 
[index=I1549955498033
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: start conditions=[
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: EQUAL arg_left=[ COLUMN: 
PUBLIC.JOBS.STATUS
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ] arg_right=[ COLUMN: C1
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: ]]
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: end condition=[
WARN 2019-03-08T23:58:20,500 (Finisher thread) - Plan: EQUAL arg_left=[ COLUMN: 
PUBLIC.JOBS.STATUS
WARN 2019-03-08T23:58:20,501 (Finisher thread) - Plan: ] arg_right=[ COLUMN: C1
WARN 

[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-12 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899260#comment-13899260
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

I believe that in my case the job repetition was depending on the wrong 
collation. When running a job with a case-insensitive collation in MySQL it get 
started again without a previous job end. The same job runs as expected with a 
correctly configured database. However, I think your fix does not intend to 
remedy completely inconsistent status values resulting from the wrong 
collation. So my setup inn't a test case for it.

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897667#comment-13897667
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

There are some error in the manifold log:

DEBUG 2014-02-11 10:11:11,989 (Thread-20602) - Actual query: [SELECT 
status,connectionname,outputname FROM jobs WHERE id=? FOR UPDATE]
DEBUG 2014-02-11 10:11:11,989 (Thread-20602) -   Parameter 0: '1392051994515'
DEBUG 2014-02-11 10:11:11,989 (Thread-20602) - Done actual query (0ms): [SELECT 
status,connectionname,outputname FROM jobs WHERE id=? FOR UPDATE]
DEBUG 2014-02-11 10:11:11,989 (Job reset thread) - Ending transaction
DEBUG 2014-02-11 10:11:11,989 (Job reset thread) - Rolling transaction back!
DEBUG 2014-02-11 10:11:11,992 (Thread-20603) - Actual query: [ROLLBACK]
DEBUG 2014-02-11 10:11:11,992 (Thread-20603) - Done actual query (0ms): 
[ROLLBACK]
ERROR 2014-02-11 10:11:11,992 (Job reset thread) - Exception tossed: Unexpected 
job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job 
status encountered: 33
at 
org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1726)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7427)
at 
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91)

There is a similar exception with Unexpected job status encountered: 34. When 
looking into the database, the status field of all jobs is constantly changing 
between 's' and 'n'.

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897672#comment-13897672
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

I'm using a Solr output connection. Manually sending a document to its update 
handler does not raise any problems, however, Manifold seems to receive only 
service interruptions. No document gets indexed.

 WARN 2014-02-11 10:17:36,592 (Job notification thread) - IO exception during 
commit: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1219)
 WARN 2014-02-11 10:17:36,592 (Job notification thread) - Service interruption 
notifying connection - retrying: IO exception during commit: The target server 
failed to respond
org.apache.manifoldcf.agents.interfaces.ServiceInterruption: IO exception 
during commit: The target server failed to respond
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:477)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:357)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:304)
at 
org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:744)
at 
org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:121)

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897703#comment-13897703
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

A job with a null output connection works fine (same repository).

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897717#comment-13897717
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

What version of MCF is this?  It does not appear to be trunk (line numbers 
don't match).


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897725#comment-13897725
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

In order to make progress here, we need to get in synch.  First, if it is 
possible for you to use trunk, that would help enormously; that is where this 
weekend's work was committed.  If you are, please check out and build:

svn co https://svn.apache.org/repos/asf/manifoldcf/trunk
cd trunk
ant make-core-deps make-deps
ant build

If you already are using trunk, please synch up, because you are out of synch:

cd trunk
svn update
ant build

Second, the unexpected value errors make no sense to me at all for the 
moment.  The only thing I can think of is that perhaps you have more than one 
agents process running, and this case has somehow not been handled properly in 
that context.  Can you confirm this?  If that's not what you think you are 
doing, can you confirm that multiple running MCF instances are not 
inadvertantly pointing to the same database instance?

Finally, the Solr connection refusals represent HTTP socket connections that 
fail.  That argues that your Solr connection parameters are wrong.  When you 
view the connection in the UI, what do you see?



 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897868#comment-13897868
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

Unfortunately, there are still the same errors with the new revision. I'm using 
the manifoldcf-combined-service.war under Tomcat 7 but I don't know if there 
are multiple agents processes running (according to the build doc I think there 
should be only one) nor how to check that. 

About the Solr connection: 
Connection status:  Connection working (View Output Connection Status)
I cannot find any wrong parameter, Solr admin is working fine. The ping request 
from manifold is visible in the access log:
127.0.0.1 - - [11/Feb/2014:15:08:52 +0100] GET 
/solr/default/admin/ping?wt=xmlversion=2.2 HTTP/1.1 200 1329

Other manually executed requests work as well:
0:0:0:0:0:0:0:1 - - [11/Feb/2014:15:12:21 +0100] GET 
/solr/default/update?commit=true HTTP/1.1 200 160

However, no further requests from manifold are logged. Did the Solr connection 
handler change? I'm using Solr 4.3.1.

*

DEBUG 2014-02-11 14:54:17,774 (Job reset thread) - Job 1385456433981 now 
completed
ERROR 2014-02-11 14:54:17,801 (Job reset thread) - Exception tossed: Unexpected 
job status encountered: 33
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job 
status encountered: 33
at 
org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1901)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7726)
at 
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91)
DEBUG 2014-02-11 14:54:17,857 (Job notification thread) - Found job 
1385456433981 in need of notification
DEBUG 2014-02-11 14:54:17,862 (Job notification thread) - Found job 
1392051994515 in need of notification
DEBUG 2014-02-11 14:54:17,867 (Job notification thread) - Found job 
1392109738731 in need of notification
DEBUG 2014-02-11 14:54:17,871 (Job notification thread) - Found job 
1392112746052 in need of notification
DEBUG 2014-02-11 14:54:17,891 (Job reset thread) - Job 1385456433981 now 
completed
ERROR 2014-02-11 14:54:17,928 (Job reset thread) - Exception tossed: Unexpected 
job status encountered: 34
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job 
status encountered: 34
at 
org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:1901)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:7726)
at 
org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91)

**

 WARN 2014-02-11 15:02:28,180 (Job notification thread) - Notification service 
interruption reported for job 1392112746052 output connection 'solr localhost': 
IO exception during commit: The target server failed to respond
org.apache.manifoldcf.agents.interfaces.ServiceInterruption: IO exception 
during commit: The target server failed to respond
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:477)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:357)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster.commitPost(HttpPoster.java:304)
at 
org.apache.manifoldcf.agents.output.solr.SolrConnector.noteJobComplete(SolrConnector.java:744)
at 
org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:118)
Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 

[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897876#comment-13897876
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

The version of SolrJ changed; it's now 4.6.0.  Usually, though, SolrJ is pretty 
backwards compatible, so I'm surprised that you are having difficulty with it, 
but at the same time I can believe that they messed up over there.  If you want 
to rule out that cause, I think you can just replace the solr-solrj.jar in the 
lib directory with the 4.3.1 equivalent, and rebuild.

I'll look at the trace over the next hour or so and see if it tells me anything 
about the state issues.  I did try this, FWIW, against localhost when Solr 
wasn't even running, and saw nothing like this in the log.

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897904#comment-13897904
 ] 

Karl Wright commented on CONNECTORS-880:


It occurs to me that your issues may be due to MySQL transactional integrity.  
There were versions of MySQL that failed to properly serialize database 
activity; anything earlier than 5.5 for instance is definitely suspect.  What 
version of MySQL are you using?

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898001#comment-13898001
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

Replacing the solj-connector with an older version does not seem to solve the 
problem. During ingestion the job gets aborted.

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898011#comment-13898011
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

There is no magic here.  Something cannot be correctly configured.  The Solr 
connector is used extensively, and there are mock tests for it as well.


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898058#comment-13898058
 ] 

Karl Wright commented on CONNECTORS-880:


It also occurs to me that perhaps you have restricted Solr in some way so that 
it does not receive POST requests.  I bet your testing uses GET requests -- but 
MCF uses multi-part posts for everything.


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898194#comment-13898194
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

I just realized that you said:

bq. Replacing the solj-connector with an older version does not seem to solve 
the problem. During ingestion the job gets aborted.

But, that's not what I asked you to try.  I asked you to replace 
lib/solr-solrj.jar with solr-solrj.jar version 4.3.1, and then:

ant clean
ant build

Replacing the connector jar will have no effect other than to possibly break 
things further.

Thanks!


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898413#comment-13898413
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

I just replaced httpclient library (without re-building) and now the HTTP POST 
works fine. I noticed that all connections except those from Manifold appeared 
with the address 0:0:0:0:0:0:0:1 instead of 127.0.0.1. There is a resolved 
issue about IPv6 addresses: 
https://issues.apache.org/jira/browse/HTTPCLIENT-1317. I don't know if this was 
really the cause of this trouble, but anyway, the new version works.

mcf-combined-service.war: 
httpclient.jar - httpclient-4.3.2.jar
httpcore.jar - httpcore-4.3.1.jar

connector-lib:
httpmime.jar - httpmime-4.3.2.jar (perhaps not important)

(binary from 
http://ftp.fau.de/apache//httpcomponents/httpclient/binary/httpcomponents-client-4.3.2-bin.zip
 at http://hc.apache.org/downloads.cgi)

After restarting Tomcat all documents get indexed by Solr. Now I configured a 
schedule to check the fix you provided. Thanks for your help!



 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898419#comment-13898419
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

We include the current 4.2.x version of httpclient (4.2.6) because that's what 
SolrJ uses.  It looks like they did not backport the fix to the 4.2.x stream, 
so we'll have to wait until SolrJ goes to 4.3.x.  I'm told that won't happen 
until Solr 4.7.  Given that, I'm glad that you found a working solution.


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-10 Thread Florian Schmedding (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896778#comment-13896778
 ] 

Florian Schmedding commented on CONNECTORS-880:
---

I installed mcf-combined-service.war version 1.6 on a Tomcat server. However, 
the test job is not working correctly. I've used an existing database from 
Manifold 1.3 with one existing output connection, one existing repository 
connection and one existing job. On the job status page there were no buttons 
to control the job, its status is End notification. Therefore I copied the 
job. Then there were buttons to start the new job, but it got only errors from 
the output connection when ingesting documents. After aborting the job, it got 
stuck with the status End notification and does not leave it. Should I better 
create new connections and jobs or are there other problems in version 1.6?

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-10 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896788#comment-13896788
 ] 

Karl Wright commented on CONNECTORS-880:


Hi Florian,

There are no other known problems in 1.6, but I have been working on the job 
state transitions.  Specifically, the various kinds of ServiceInterruption from 
output connectors are now honored.  So if you are using an output connector 
that is responding to a request with a ServiceInterruption, then unless that 
connector is sending the right variety of ServiceInterruption it is conceivable 
that it could retry indefinitely, or at least for much longer than you would 
like.

Could you be more specific about which output connector you are using?



 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-05 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892232#comment-13892232
 ] 

Karl Wright commented on CONNECTORS-880:


Merged the branch to trunk, after it passed all tests: r1564804.  Not clear if 
it definitively works; it's pretty hard to repro without building a test output 
connector, and didn't have time for that yet.


 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-04 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891073#comment-13891073
 ] 

Karl Wright commented on CONNECTORS-880:


The problem is happening during the cleanup phase of the crawl, most likely 
because of an error from the output connector.

Normally:
JobResetThread calls JobManager.resetJobs() which normally calls 
Jobs.finishJob(time).

Error condition:
DocumentCleanupThread calls JobManager.errorAbort(), which is clearly the wrong 
method to use, since it just throws an exception for any jobs in the 
STATUS_SHUTTINGDOWN state:   throw new ManifoldCFException(Job +jobID+ 
is not active);

So, the SHUTTINGDOWN state needs its own abort method, e.g. cleanupAbort(), and 
possibly its own state (ABORTING_CLEANINGUP).



 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update last checked time

2014-02-04 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891261#comment-13891261
 ] 

Karl Wright commented on CONNECTORS-880:


Created a branch, CONNECTORS-880, with the proposed fix.  Hoping that it works 
in the field.  Passes tests here -- although I don't have a good test that 
tosses exceptions on output connection document deletion.  May need to come up 
with a test output connector that I can use to exercise this condition.

 Under the right conditions, job aborts do not update last checked time
 

 Key: CONNECTORS-880
 URL: https://issues.apache.org/jira/browse/CONNECTORS-880
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 1.4.1
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 1.6


 When a scheduled job is being considered to be started, MCF updates the 
 last-check field ONLY if the job didn't start.  It relies on the job's 
 completion to set the last-check field in the case where the job does start.  
 But if the job aborts, in at least one case the last-check field is NOT 
 updated.  This leads to the job being run over and over again within the 
 schedule window.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)