Hi Karl, The only error message which seems to be continuously thrown in manifold log is :
FATAL 2015-05-08 18:42:47,043 (Worker thread '40') - Error tossed: null java.lang.NullPointerException I do notice that the file that needs to deleted is shown under the Queue Status report and keeps jumping between “Processing” and “About to Process” statuses every 30 seconds. Timo > On May 8, 2015, at 1:40 PM, Karl Wright <[email protected]> wrote: > > Hi Timo, > > As I said, I don't think your configuration is the source of the delete > issue. I suspect the searchblox connector. > > In the absence of a thread dump, can you look for exceptions in the > manifoldcf log? > > Karl > > Sent from my Windows Phone > From: Timo Selvaraj > Sent: 5/8/2015 10:06 AM > To: [email protected] <mailto:[email protected]> > Subject: Re: File system continuous crawl settings > > When I change the settings to the following, updated or modified documents > are now indexed but deleting the documents that are removed is still an issue: > > Schedule type: Rescan documents dynamically > Minimum recrawl interval: 5 minutes Maximum recrawl interval: > 10 minutes > Expiration interval: Infinity Reseed interval: 60 minutes > No scheduled run times > Maximum hop count for link type 'child': Unlimited > Hop count mode: Delete unreachable documents > > Do I need to set the reseed interval to Infinity? > > Any thoughts? > > >> On May 8, 2015, at 6:18 AM, Karl Wright <[email protected] >> <mailto:[email protected]>> wrote: >> >> I just tried your configuration here. A deleted document in the file system >> was indeed picked up as expected. >> >> I did notice that your "expiration" setting is, essentially, cleaning out >> documents at a rapid clip. With this setting, documents will be expired >> before they are recrawled. You probably want one strategy or the other but >> not both. >> >> As for why a deleted document is "stuck" in Processing: the only thing I can >> think of is that the output connection you've chosen is having trouble >> deleting the document from the index. What output connector are you using? >> >> Karl >> >> >> On Fri, May 8, 2015 at 4:36 AM, Timo Selvaraj <[email protected] >> <mailto:[email protected]>> wrote: >> Hi, >> >> We are testing the continuous crawl feature for file system connector on a >> small folder to test if new documents are added to the folder, missing >> documents removed and modified documents updated are handled by the >> continuous crawl job: >> >> Here are the settings we use: >> >> Schedule type: Rescan documents dynamically >> Minimum recrawl interval: 5 minutes Maximum recrawl interval: >> 10 minutes >> Expiration interval: 5 minutes Reseed interval: 10 minutes >> No scheduled run times >> Maximum hop count for link type 'child': Unlimited >> Hop count mode: Delete unreachable documents >> >> >> Adding new documents seem to be getting picked up by the job however removal >> of a document or update to a document are not being picked up. >> >> Am I missing any settings for the deletions or updates? I do see the >> document that has been removed is showing as Processing under Queue Status >> and others are showing as Waiting for Processing. >> >> Any idea what setting is missing for the deletes/updates to be recognized >> and re-indexed? >> >> Thanks, >> Timo >> >
