Hello Karl,

 

Ok I build and test this version.

 

Thanks 

Maxence,

 

 

De : Karl Wright [mailto:daddy...@gmail.com] 
Envoyé : jeudi 21 juin 2018 02:43
À : user@manifoldcf.apache.org
Objet : Re: Documents blocked sometimes without errors

 

Patch attached, and fix committed to trunk.

Karl

 

 

On Wed, Jun 20, 2018 at 8:32 PM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

I've had time to look at this further.  I believe that under some conditions, 
when errors occur during processing a document, it might be possible to wind up 
in this state.  I'm in the process of working out a solution now.

 

Karl

 

 

On Mon, Jun 18, 2018 at 8:44 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

Okay. I test to reproduce the problem again and view if they are they sames 
documents or if I have a pattern or other similarities.

 

Maxence,

 

 

 

 

De : Karl Wright [mailto:daddy...@gmail.com <mailto:daddy...@gmail.com> ] 
Envoyé : lundi 18 juin 2018 14:42
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

If you are certain these are new documents, then there is no need to repeat 
yourself.

But we do need to get some idea what action yields documents in this state.  As 
I said before, it did not look possible to get there through any mechanism I 
can find.  But I won't be able to look in full depth for a few days.

 

Karl

 

 

On Mon, Jun 18, 2018 at 8:38 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

I changed about ten days ago and the jobs were running correctly. I could do 2 
passages without problems since the introduction of the trunk version. I have a 
doubt that they are old documents. I restart indexing and if it happens again, 
I'll tell you.

 

Maxence,

 

De : Karl Wright [mailto:daddy...@gmail.com <mailto:daddy...@gmail.com> ] 
Envoyé : lundi 18 juin 2018 14:25
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

My concern is that you upgraded the code but DID NOT do the pause/resume after 
you did that.  If that was was the sequence, you were left with old, un-updated 
records.

 

 

 

On Mon, Jun 18, 2018 at 8:18 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

Yes my solution is paused the job and resume it.

 

With the trunk version, I feel it's less common but the problem is still here.

 

 

De : Karl Wright [mailto:daddy...@gmail.com <mailto:daddy...@gmail.com> ] 
Envoyé : lundi 18 juin 2018 12:14
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

Just so it is clear, the fix only will address documents that are in the 
"ACTIVE" state.  Documents that are already blocked will not be fixed.  The way 
you fix the blocked documents is by pausing and resuming the job that the 
documents are part of -- and then, if you are running the patched version of 
MCF, you should not see blocked documents again.

 

Thanks,

Karl

 

 

On Mon, Jun 18, 2018 at 5:15 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:


Forget that. My ln-s is good on this server. I confused the servers. So I have 
a similar problem with trunk. I continu the tests.

 

 

 

De : msaunier [mailto:msaun...@citya.com <mailto:msaun...@citya.com> ] 
Envoyé : lundi 18 juin 2018 11:13
À : 'user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> ' 
<user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> >
Objet : RE: Documents blocked sometimes without errors

 

Ok I have miss my ln –s so my link go to 2.9.1. Sorry for this error. Your 
corrections are okay.

 

 

De : Karl Wright [mailto:daddy...@gmail.com] 
Envoyé : lundi 18 juin 2018 10:43
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

If there's any chance these were leftover from before the patch was applied, we 
should try to eliminate that.  To do that:

 

- pause the job

- restart the job


Then, either wait for the script-based agents process shutdown, or shut down 
the agents process manually and restart.  Do this a number of times and see if 
any documents become stuck.

 

Thanks,

Karl

 

 

On Mon, Jun 18, 2018 at 4:35 AM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

These are still indeed blocked.

 

Unfortunately I don't see any pathway for documents to wind up in such a state. 
 I'll have to look in more depth and get back to you later.

 

Karl

 

On Mon, Jun 18, 2018 at 4:07 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

CSV joined.

 

Thanks,

Maxence,

 

 

 

De : Karl Wright [mailto:daddy...@gmail.com <mailto:daddy...@gmail.com> ] 
Envoyé : lundi 18 juin 2018 10:02
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

The only way to know if these are truly blocked is to find the document records 
in the database and include them here.

 

Thanks,

Karl

 

 

On Mon, Jun 18, 2018 at 3:55 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

Hello Karl,

 

Today, I have 2 documents blocked on the new trunk version (I think). Can I 
verify my trunk vertion after the build?

 

Thanks,

Maxence ,

 

 

De : msaunier [mailto:msaun...@citya.com <mailto:msaun...@citya.com> ] 
Envoyé : mardi 5 juin 2018 14:54
À : 'user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> ' 
<user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> >
Objet : RE: Documents blocked sometimes without errors

 

Ok. I have build and deploy.

 

The tests are in progress.

 

Thanks,

Maxence

 

 

De : Karl Wright [mailto:daddy...@gmail.com] 
Envoyé : lundi 4 juin 2018 19:55
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

I attached a patch to the ticket that is a tentative fix.  Please let me know 
if you still see this problem after applying it.  Thanks!

 

Karl

 

 

On Mon, Jun 4, 2018 at 12:56 PM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

CONNECTORS-1507 created.

Karl

 

 

On Mon, Jun 4, 2018 at 12:51 PM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

I think I found the issue.

Basically, when the agents process is restarted, it doesn't reprioritize the 
documents that were active when the service was brought down, but it should 
because when the documents became active they lost their document priority.  
This should be trivial to fix.

 

Karl

 

 

On Mon, Jun 4, 2018 at 12:35 PM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

Hi Maxence,

 

The docpriority values for these stuck documents show that they are "null":

 

  public static final double noDocPriorityValue = 1e9;

  public static final Double nullDocPriority = new Double(noDocPriorityValue + 
1.0);

 

The document status is "G", which is STATUS_PENDINGPURGATORY, so the documents 
are awaiting being queued, which they will never be with a docpriority set to 
nullDocPriority.

 

It isn't supposed to be possible for a document to wind up in this state.  
Documents that are pending are always supposed to set a document priority.  I 
will need to review the code to see how this could happen.

It is also possible that you're seeing a database bug.  I presume that you are 
running Postgresql?

 

Karl

 

 

On Mon, Jun 4, 2018 at 8:43 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

Thanks for your answers.

 

So, I join at this email -> interface screen and csv result.

 

Thanks,

Maxence

 

 

 

De : Karl Wright [mailto:daddy...@gmail.com <mailto:daddy...@gmail.com> ] 
Envoyé : lundi 4 juin 2018 11:36
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Documents blocked sometimes without errors

 

Oh, and it should be unnecessary to pause/resume jobs when you bring down 
ManifoldCF for database maintenance.  Stop the agents service, and start it 
again, and you should pick up exactly where you left off.

 

Karl

 

 

On Mon, Jun 4, 2018 at 5:33 AM Karl Wright <daddy...@gmail.com 
<mailto:daddy...@gmail.com> > wrote:

Hi Maxence,

 

Pausing and restarting a job causes all of its documents to have their 
docpriority field be recalculated.  It should not be necessary to do this in 
order to have job complete, though.

 

All documents that are queued have their docpriority set at the time they are 
added to the queue, but the docpriority they are given depends on how many 
documents in the same document bin that have already been given docpriority 
values.  This is done to make sure documents from all bins are given an equal 
chance of being crawled.  But since documents are given a docpriority when 
queued, there may well have been plenty of other documents "in front" of them 
that are already queued and must be processed before there's any chance of 
getting crawled.  So it is possible that documents from one job may appear to 
block documents from another -- but this will eventually correct itself and 
those documents will be crawled.

If you see *no* activity at all, however, then I wonder if somehow documents 
have been queued with a null docpriority.  You can test this by looking at the 
Document Status report and verifying that there is no reason the documents 
should not be crawlable, and then looking in the database to see what they have 
for their docpriority field.  Please let me know what you find.

 

Thanks,

Karl

 

 

On Mon, Jun 4, 2018 at 4:20 AM msaunier <msaun...@citya.com 
<mailto:msaun...@citya.com> > wrote:

Hello Karl,

 

Sometimes, jobs are blocked by many documents and I don’t know why because I 
don’t have errors. To unblock this, I paused and resume the job and it working. 
This is not always the case and they are never the same documents.

 

We have a script at 8h55 PM and it’s possibly the reason of this error. We have 
create this script to avoid error, because SCO servers are reboot at 9h00 PM 
and ManifoldCF have an error if they servers are stopped.

 

Script explanation:

 

1.       Call PAUSED for the current job at 8h55PM

2.       Call ManifoldCF stop and wait 

3.       VACUUM FULL Postgres

4.       REINDEX Postgres

5.       (Wait 9h05 PM)

6.       Start ManifoldCF

7.       Wait ManifoldCF

8.       Resume job

 

Do you have an idea to resolved this problem? It’s the REINDEX or the VACUUM 
FULL the problem?

 

Thanks,

Maxence

 

Reply via email to