I've had time to look at this further. I believe that under some conditions, when errors occur during processing a document, it might be possible to wind up in this state. I'm in the process of working out a solution now.
Karl On Mon, Jun 18, 2018 at 8:44 AM msaunier <msaun...@citya.com> wrote: > Okay. I test to reproduce the problem again and view if they are they > sames documents or if I have a pattern or other similarities. > > > > Maxence, > > > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* lundi 18 juin 2018 14:42 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > If you are certain these are new documents, then there is no need to > repeat yourself. > > But we do need to get some idea what action yields documents in this > state. As I said before, it did not look possible to get there through any > mechanism I can find. But I won't be able to look in full depth for a few > days. > > > > Karl > > > > > > On Mon, Jun 18, 2018 at 8:38 AM msaunier <msaun...@citya.com> wrote: > > I changed about ten days ago and the jobs were running correctly. I could > do 2 passages without problems since the introduction of the trunk version. > I have a doubt that they are old documents. I restart indexing and if it > happens again, I'll tell you. > > > > Maxence, > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* lundi 18 juin 2018 14:25 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > My concern is that you upgraded the code but DID NOT do the pause/resume > after you did that. If that was was the sequence, you were left with old, > un-updated records. > > > > > > > > On Mon, Jun 18, 2018 at 8:18 AM msaunier <msaun...@citya.com> wrote: > > Yes my solution is paused the job and resume it. > > > > With the trunk version, I feel it's less common but the problem is still > here. > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* lundi 18 juin 2018 12:14 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > Just so it is clear, the fix only will address documents that are in the > "ACTIVE" state. Documents that are already blocked will not be fixed. The > way you fix the blocked documents is by pausing and resuming the job that > the documents are part of -- and then, if you are running the patched > version of MCF, you should not see blocked documents again. > > > > Thanks, > > Karl > > > > > > On Mon, Jun 18, 2018 at 5:15 AM msaunier <msaun...@citya.com> wrote: > > > Forget that. My ln-s is good on this server. I confused the servers. So I > have a similar problem with trunk. I continu the tests. > > > > > > > > *De :* msaunier [mailto:msaun...@citya.com] > *Envoyé :* lundi 18 juin 2018 11:13 > *À :* 'user@manifoldcf.apache.org' <user@manifoldcf.apache.org> > *Objet :* RE: Documents blocked sometimes without errors > > > > Ok I have miss my ln –s so my link go to 2.9.1. Sorry for this error. Your > corrections are okay. > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com <daddy...@gmail.com>] > *Envoyé :* lundi 18 juin 2018 10:43 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > If there's any chance these were leftover from before the patch was > applied, we should try to eliminate that. To do that: > > > > - pause the job > > - restart the job > > > Then, either wait for the script-based agents process shutdown, or shut > down the agents process manually and restart. Do this a number of times > and see if any documents become stuck. > > > > Thanks, > > Karl > > > > > > On Mon, Jun 18, 2018 at 4:35 AM Karl Wright <daddy...@gmail.com> wrote: > > These are still indeed blocked. > > > > Unfortunately I don't see any pathway for documents to wind up in such a > state. I'll have to look in more depth and get back to you later. > > > > Karl > > > > On Mon, Jun 18, 2018 at 4:07 AM msaunier <msaun...@citya.com> wrote: > > CSV joined. > > > > Thanks, > > Maxence, > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* lundi 18 juin 2018 10:02 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > The only way to know if these are truly blocked is to find the document > records in the database and include them here. > > > > Thanks, > > Karl > > > > > > On Mon, Jun 18, 2018 at 3:55 AM msaunier <msaun...@citya.com> wrote: > > Hello Karl, > > > > Today, I have 2 documents blocked on the new trunk version (I think). Can > I verify my trunk vertion after the build? > > > > Thanks, > > Maxence , > > > > > > *De :* msaunier [mailto:msaun...@citya.com] > *Envoyé :* mardi 5 juin 2018 14:54 > *À :* 'user@manifoldcf.apache.org' <user@manifoldcf.apache.org> > *Objet :* RE: Documents blocked sometimes without errors > > > > Ok. I have build and deploy. > > > > The tests are in progress. > > > > Thanks, > > Maxence > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com <daddy...@gmail.com>] > *Envoyé :* lundi 4 juin 2018 19:55 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > I attached a patch to the ticket that is a tentative fix. Please let me > know if you still see this problem after applying it. Thanks! > > > > Karl > > > > > > On Mon, Jun 4, 2018 at 12:56 PM Karl Wright <daddy...@gmail.com> wrote: > > CONNECTORS-1507 created. > > Karl > > > > > > On Mon, Jun 4, 2018 at 12:51 PM Karl Wright <daddy...@gmail.com> wrote: > > I think I found the issue. > > Basically, when the agents process is restarted, it doesn't reprioritize > the documents that were active when the service was brought down, but it > should because when the documents became active they lost their document > priority. This should be trivial to fix. > > > > Karl > > > > > > On Mon, Jun 4, 2018 at 12:35 PM Karl Wright <daddy...@gmail.com> wrote: > > Hi Maxence, > > > > The docpriority values for these stuck documents show that they are "null": > > > > public static final double noDocPriorityValue = 1e9; > > public static final Double nullDocPriority = new > Double(noDocPriorityValue + 1.0); > > > > The document status is "G", which is STATUS_PENDINGPURGATORY, so the > documents are awaiting being queued, which they will never be with a > docpriority set to nullDocPriority. > > > > It isn't supposed to be possible for a document to wind up in this state. > Documents that are pending are always supposed to set a document priority. > I will need to review the code to see how this could happen. > > It is also possible that you're seeing a database bug. I presume that you > are running Postgresql? > > > > Karl > > > > > > On Mon, Jun 4, 2018 at 8:43 AM msaunier <msaun...@citya.com> wrote: > > Thanks for your answers. > > > > So, I join at this email -> interface screen and csv result. > > > > Thanks, > > Maxence > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* lundi 4 juin 2018 11:36 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Documents blocked sometimes without errors > > > > Oh, and it should be unnecessary to pause/resume jobs when you bring down > ManifoldCF for database maintenance. Stop the agents service, and start it > again, and you should pick up exactly where you left off. > > > > Karl > > > > > > On Mon, Jun 4, 2018 at 5:33 AM Karl Wright <daddy...@gmail.com> wrote: > > Hi Maxence, > > > > Pausing and restarting a job causes all of its documents to have their > docpriority field be recalculated. It should not be necessary to do this > in order to have job complete, though. > > > > All documents that are queued have their docpriority set at the time they > are added to the queue, but the docpriority they are given depends on how > many documents in the same document bin that have already been given > docpriority values. This is done to make sure documents from all bins are > given an equal chance of being crawled. But since documents are given a > docpriority when queued, there may well have been plenty of other documents > "in front" of them that are already queued and must be processed before > there's any chance of getting crawled. So it is possible that documents > from one job may appear to block documents from another -- but this will > eventually correct itself and those documents will be crawled. > > If you see *no* activity at all, however, then I wonder if somehow > documents have been queued with a null docpriority. You can test this by > looking at the Document Status report and verifying that there is no reason > the documents should not be crawlable, and then looking in the database to > see what they have for their docpriority field. Please let me know what > you find. > > > > Thanks, > > Karl > > > > > > On Mon, Jun 4, 2018 at 4:20 AM msaunier <msaun...@citya.com> wrote: > > Hello Karl, > > > > Sometimes, jobs are blocked by many documents and I don’t know why because > I don’t have errors. To unblock this, I paused and resume the job and it > working. This is not always the case and they are never the same documents. > > > > We have a script at 8h55 PM and it’s possibly the reason of this error. We > have create this script to avoid error, because SCO servers are reboot at > 9h00 PM and ManifoldCF have an error if they servers are stopped. > > > > Script explanation: > > > > 1. Call PAUSED for the current job at 8h55PM > > 2. Call ManifoldCF stop and wait > > 3. VACUUM FULL Postgres > > 4. REINDEX Postgres > > 5. (Wait 9h05 PM) > > 6. Start ManifoldCF > > 7. Wait ManifoldCF > > 8. Resume job > > > > Do you have an idea to resolved this problem? It’s the REINDEX or the > VACUUM FULL the problem? > > > > Thanks, > > Maxence > > > >