[ https://issues.apache.org/jira/browse/CONNECTORS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1602. ------------------------------------- Resolution: Not A Problem > Continuous crawling doesn't recrawl everything > ---------------------------------------------- > > Key: CONNECTORS-1602 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1602 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Reporter: Donald Van den Driessche > Priority: Major > > When crawling a website in continuous crawling mode we saw that not all > documents are recrawled. > The site is quite extensive. We figured out that after crawling a > document/page gets a recrawl timestamp in between the recrawl interval and > max recrawl interval. > But if these values occur within the first crawl, Manifold starts recrawling > those, but seems to ignore the rest of the website. Also sometimes documents > get recrawled 5 times while other don't get recrawled. Apparently due to the > same issue. > > Is it possible to shed a bit more light on the continuous crawling? > Is it a good system to use for crawling a (extensive) website? -- This message was sent by Atlassian JIRA (v7.6.3#76005)