I'm actually not using zookeeper. i want to know how is zookeeper different from file based sync? I also need a guidance on how to manage my pc's memory. How many Go should I allocate for the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
Othman. On Wed, 30 Aug 2017 at 16:11, Karl Wright <daddy...@gmail.com> wrote: > Your disk is not writable for some reason, and that's interfering with > ManifoldCF 2.8 locking. > > I would suggest two things: > > (1) Use Zookeeper for sync instead of file-based sync. > (2) Have a look if you still get failures after that. > > Thanks, > Karl > > > On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <i93oth...@gmail.com> > wrote: > >> Hi Mr Karl, >> >> Thank you Mr Karl for your quick response. I have looked into the >> ManifoldCF log file and extracted the following warnings : >> >> - Attempt to set file lock >> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase) >> Synapses.lock' failed : Access is denied. >> >> >> - Couldn't write to lock file; disk may be full. Shutting down process; >> locks may be left dangling. You must cleanup before restarting. >> >> ES (lowercase) synapses being the elasticsearch output connection. >> Moreover, the job uses Tika to extract metadata and a file system as a >> repository connection. During the job, I don't extract the content of the >> documents. I was wandering if the issue comes from elasticsearch ? >> >> Othman. >> >> >> >> On Wed, 30 Aug 2017 at 14:08, Karl Wright <daddy...@gmail.com> wrote: >> >>> Hi Othman, >>> >>> ManifoldCF aborts a job if there's an error that looks like it might go >>> away on retry, but does not. It can be either on the repository side or on >>> the output side. If you look at the Simple History in the UI, or at the >>> manifoldcf.log file, you should be able to get a better sense of what went >>> wrong. Without further information, I can't say any more. >>> >>> Thanks, >>> Karl >>> >>> >>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <i93oth...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I'm Othman Belhaj, a software engineer from société générale in France. >>>> I'm actually using your recent version of manifoldCF 2.8 . I'm working on >>>> an internal search engine. For this reason, I'm using manifoldcf in order >>>> to index documents on windows shares. I encountered a serious problem while >>>> crawling 35K documents. Most of the time, when manifoldcf start crawling a >>>> big sized documents (19Mo for example), it ends the job with the following >>>> error: repeated service interruptions - failure processing document : >>>> software caused connection abort: socket write error. >>>> Can you give me some tips on how to solve this problem, please ? >>>> >>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 . >>>> I'm looking forward for your response. >>>> >>>> Best regards, >>>> >>>> Othman BELHAJ >>>> >>> >>> >