Re: Job error during WindowsShare repository connector indexation

Olivier Tavard Wed, 11 Oct 2017 08:16:53 -0700

Hi,

Thanks for your answers.
OK I will definitively use Zookeeper rather file-based synchronization and let 
you know.


For information, the syncharea folder during our crawl was not accessed by any 
other process. The server is dedicated to MCF. The OS is Debian 8 and the files 
are on standard Linux filesystem (ext3). We did not increase the max open files 
in this server (only on the Solr servers), it is a good thing to investigate, 
thanks.
Regardless of the change for ZK, is it possible to change this behavior in MCF 
by automatically stopping the job for example when this exception occurs ?

Thanks,

Olivier TAVARD


> Le 11 oct. 2017 à 14:15, Karl Wright <daddy...@gmail.com> a écrit :
> 
> In this case it's the *directory* that it doesn't find, so it can't create 
> the file.  If the syncharea is in an NFS-mounted filesystem, then you can get 
> problems of this kind, which is why we strongly advise using Zookeeper 
> instead of playing those kinds of games.
> 
> Karl
> 
> 
> On Wed, Oct 11, 2017 at 7:20 AM, Luis Cabaceira <cabace...@gmail.com 
> <mailto:cabace...@gmail.com>> wrote:
> I've seen similar errors (that actually seam like the file is not there or 
> has been deleted, while in fact it exists) due to the reasons i've wrote 
> before.
> 
> On 11 October 2017 at 15:12, Karl Wright <daddy...@gmail.com 
> <mailto:daddy...@gmail.com>> wrote:
> This error:
> 
> >>>>>>
> WARN 2017-10-09 08:23:56,284 (Idle cleanup thread) - 
> MCF|MCF-agent|apache.manifoldcf.lock|Attempt to set file lock 
> 'mcf/mcf_home/./syncharea/551/442/lock-_POOLTARGET__REPOSITORYCONNECTORPOOL_SmbFileShare.lock'
>  failed: No such file or directory
> java.io.IOException: No such file or directory
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at 
> org.apache.manifoldcf.core.lockmanager.FileLockObject.grabFileLock(FileLockObject.java:223)
> at 
> org.apache.manifoldcf.core.lockmanager.FileLockObject.obtainGlobalWriteLockNoWait(FileLockObject.java:78)
> at 
> org.apache.manifoldcf.core.lockmanager.LockObject.obtainGlobalWriteLock(LockObject.java:121)
> at 
> org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:74)
> at 
> org.apache.manifoldcf.core.lockmanager.LockGate.enterWriteLock(LockGate.java:177)
> at 
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWrite(BaseLockManager.java:1120)
> at 
> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWriteLock(BaseLockManager.java:757)
> at 
> org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteLock(LockManager.java:302)
> at 
> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:585)
> at 
> org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
> at 
> org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
> at 
> org.apache.manifoldcf.crawlerui.IdleCleanupThread.run(IdleCleanupThread.java:69)
> And the error was repeated indefinitely in the log.
> <<<<<<
> 
> is due to somebody erasing the file-based syncharea while ManifoldCF 
> processes were active.  We strongly suggest using Zookeeper rather than 
> file-based synch, in any case.
> 
> Thanks,
> 
> Karl
> 
> 
> On Wed, Oct 11, 2017 at 6:05 AM, Luis Cabaceira <cabace...@gmail.com 
> <mailto:cabace...@gmail.com>> wrote:
> From the look of it, this can be coming from a limitation on the number file 
> handles. You process can be creating too many file handles and not closing 
> those in time, eventually preventing further file operations. 
> 
> I suggest you check this, in Linux run : cat /proc/sys/fs/file-max
> 
> 
> To see the hard and soft values : 
> 
> # ulimit -Hn
> # ulimit -Sn
> 
> P.S. - Change into the user that is running Manifold first
> 
> 
> On 11 October 2017 at 13:54, Olivier Tavard <olivier.tav...@francelabs.com 
> <mailto:olivier.tav...@francelabs.com>> wrote:
> Hi,
> 
> Thanks for your answer.
> Yes I could reach the samba server from the MCF server. Indeed, the first 
> hours after the MCF job was launched, thousands of documents were correctly 
> accessed and processed by MCF. The mentioned errors appeared only after few 
> hours. Before that, the indexation was done correctly.
> 
> Best regards,
> Olivier TAVARD
> 
> 
>> Le 11 oct. 2017 à 11:21, Cihad Guzel <cguz...@gmail.com 
>> <mailto:cguz...@gmail.com>> a écrit :
>> 
>> Hi Olivier,
>> 
>> Did you try to connect to samba server with any samba client app? Check 
>> Iptables on your server. Can you stop iptables on ubuntu server? Maybe, you 
>> can configure iptables.
>> 
>> Regards,
>> Cihad Guzel
>> 
>> 
>> 2017-10-11 12:02 GMT+03:00 Olivier Tavard <olivier.tav...@francelabs.com 
>> <mailto:olivier.tav...@francelabs.com>>:
>> Hi,
>> 
>> I had this error during crawling a Samba hosted on Ubuntu Server :
>> ERROR 2017-10-05 00:00:14,109 (Idle cleanup thread) - 
>> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Exception tossed: Service 
>> '_ANON_0' of type '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
>> org.apache.manifoldcf.core.int 
>> <http://org.apache.manifoldcf.core.int/>erfaces.ManifoldCFException: Service 
>> '_ANON_0' of type '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
>> at 
>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.updateServiceData(BaseLockManager.java:273)
>> at 
>> org.apache.manifoldcf.core.lockmanager.LockManager.updateServiceData(LockManager.java:108)
>> at 
>> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:654)
>> at 
>> org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
>> at 
>> org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
>> at 
>> org.apache.manifoldcf.crawler.system.IdleCleanupThread.run(IdleCleanupThread.java:68)
>> 
>> I used MCF 2.8.1 on Debian 8 with Postgresql 9.5.3, Windows Share repository 
>> connector. The job was configured to process about 2 millions of files  (600 
>> GB). 
>> For text extraction I used a Tika server (on the same server as MCF) and add 
>> the Tika external content extractor transformation connector into the job 
>> configuration.
>> The error was present 9 hours after the job was launched. The status job 
>> still indicated that the job was running but there was only 1 document in 
>> the active column and the error above was repeated in the MCF log.
>> 
>> Then I tried to launch the clean-lock.sh script and I obtained this error :
>> WARN 2017-10-09 08:23:56,284 (Idle cleanup thread) - 
>> MCF|MCF-agent|apache.manifoldcf.lock|Attempt to set file lock 
>> 'mcf/mcf_home/./syncharea/551/442/lock-_POOLTARGET__REPOSITORYCONNECTORPOOL_SmbFileShare.lock'
>>  failed: No such file or directory
>> java.io.IOException: No such file or directory
>> at java.io.UnixFileSystem.createFileExclusively(Native Method)
>> at java.io.File.createNewFile(File.java:1012)
>> at 
>> org.apache.manifoldcf.core.lockmanager.FileLockObject.grabFileLock(FileLockObject.java:223)
>> at 
>> org.apache.manifoldcf.core.lockmanager.FileLockObject.obtainGlobalWriteLockNoWait(FileLockObject.java:78)
>> at 
>> org.apache.manifoldcf.core.lockmanager.LockObject.obtainGlobalWriteLock(LockObject.java:121)
>> at 
>> org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:74)
>> at 
>> org.apache.manifoldcf.core.lockmanager.LockGate.enterWriteLock(LockGate.java:177)
>> at 
>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWrite(BaseLockManager.java:1120)
>> at 
>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWriteLock(BaseLockManager.java:757)
>> at 
>> org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteLock(LockManager.java:302)
>> at 
>> org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:585)
>> at 
>> org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
>> at 
>> org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
>> at 
>> org.apache.manifoldcf.crawlerui.IdleCleanupThread.run(IdleCleanupThread.java:69)
>> And the error was repeated indefinitely in the log.
>> 
>> Did it mean that there was a problem with the syncharea folder at some point 
>> ?
>> 
>> Thanks,
>> Best regards,
>> 
>> Olivier TAVARD
>> 
>> 
>> 
>> -- 
>> Cihad Güzel
> 
> 
> 
> 
> -- 
> Luis Cabaceira
> 
> 
> 
> 
> -- 
> Luis Cabaceira
>

Re: Job error during WindowsShare repository connector indexation

Reply via email to