Could it be a problem of elasticsearch's version ? I'm actually using 2.1.0
which is pretty old for this new version of ManifoldCF?

Othman.

On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <i93oth...@gmail.com> wrote:

> I moved back both the jars you mentioned and a different is showing. You
> will find the stack trace attached.
>
> Thanks,
> Othman
>
> On Thu, 31 Aug 2017 at 17:09, Karl Wright <daddy...@gmail.com> wrote:
>
>> I've looked at the dependencies; you should not have moved poi-3.15.jar.
>> Please move that back, and commons-collections4-4.1.jar too.
>>
>> You *will* need to move curvesapi-1.04.jar though.
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <daddy...@gmail.com> wrote:
>>
>>> If you include poi.jar, then all dependencies of poi.jar must also be
>>> included.  This would mean that curvesapi-1.04.jar and
>>> commons-collections4-4.1.jar should also be included.
>>>
>>> Karl
>>>
>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <i93oth...@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> I added the two jars that you have mentioned and another one :
>>>> poi-3.15.jar . Unfortunately, there is another error showing. This time, it
>>>> concerns excel files. You will find attached the stack trace.
>>>>
>>>> Othman.
>>>>
>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <daddy...@gmail.com> wrote:
>>>>
>>>>> Hi Othman,
>>>>>
>>>>> Yes, this shows that the jar we moved calls back into another jar,
>>>>> which will also need to be moved.  *That* jar has yet another dependency
>>>>> too.
>>>>>
>>>>> The list of jars is thus extended to include:
>>>>>
>>>>> poi-ooxml-3.15.jar
>>>>> dom4j-1.6.1.jar
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <i93oth...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You will find attached the stack trace. My apologies for the bad
>>>>>> quality of the image, I'm doing my best to send you the stack trace as I
>>>>>> don't have the right to send documents outside the company.
>>>>>>
>>>>>> Thank you for your time,
>>>>>>
>>>>>> Othman
>>>>>>
>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <daddy...@gmail.com> wrote:
>>>>>>
>>>>>>> Once again, I need a stack trace to diagnose what the problem is.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <i93oth...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Oh, actually it didn't solve the problem. I looked into the log
>>>>>>>> file and saw the following error:
>>>>>>>>
>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader.
>>>>>>>>
>>>>>>>> Maybe another jar is missing ?
>>>>>>>>
>>>>>>>> Othman.
>>>>>>>>
>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <i93oth...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have tried what you told me to do, and you expected the crawling
>>>>>>>>> resumed. How about the regular expressions? How can I make complex 
>>>>>>>>> regular
>>>>>>>>> expressions in the job's paths tab ?
>>>>>>>>>
>>>>>>>>> Thank you very much for your help.
>>>>>>>>>
>>>>>>>>> Othman.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <i93oth...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Ok, I will try it right away and let you know if it works.
>>>>>>>>>>
>>>>>>>>>> Othman.
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <daddy...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Oh, and you also may need to edit your options.env files to
>>>>>>>>>>> include them in the classpath for startup.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <daddy...@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> If you are amenable, there is another workaround you could
>>>>>>>>>>>> try.  Specifically:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>> (2) Move the following two files from connector-common-lib to
>>>>>>>>>>>> lib:
>>>>>>>>>>>>
>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>
>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>> daddy...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One simple workaround is to use the external Tika server
>>>>>>>>>>>>> transformer rather than the embedded Tika Extractor.  I'm still 
>>>>>>>>>>>>> looking
>>>>>>>>>>>>> into why the jar is not being found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and my job
>>>>>>>>>>>>>> got stuck on that specific file.
>>>>>>>>>>>>>> The job status is still Running. You can see it in the
>>>>>>>>>>>>>> attached file. For your information, the job started yesterday.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <daddy...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing.
>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you are
>>>>>>>>>>>>>>> indeed using the binary distribution.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm actually using the binary version. For security
>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I have 
>>>>>>>>>>>>>>>> copied the stack
>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will be 
>>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to restrict 
>>>>>>>>>>>>>>>> the crawling
>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For 
>>>>>>>>>>>>>>>> instance, I would
>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that counts 
>>>>>>>>>>>>>>>> the 'sound'
>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is with 
>>>>>>>>>>>>>>>> capital letters
>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>> daddy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way you restrict documents with the windows share
>>>>>>>>>>>>>>>>> connector is by specifying information on the "Paths" tab in 
>>>>>>>>>>>>>>>>> jobs that
>>>>>>>>>>>>>>>>> crawl windows shares.  There is end-user documentation both 
>>>>>>>>>>>>>>>>> online and
>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe how 
>>>>>>>>>>>>>>>>> to do this.
>>>>>>>>>>>>>>>>> Have you found it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using zookeeper
>>>>>>>>>>>>>>>>>> and I will let you know if it works. I have another question 
>>>>>>>>>>>>>>>>>> to ask.
>>>>>>>>>>>>>>>>>> Actually, I need to make some filters while crawling. I 
>>>>>>>>>>>>>>>>>> don't want to crawl
>>>>>>>>>>>>>>>>>> some files and some folders. Could you give me an example of 
>>>>>>>>>>>>>>>>>> how to use the
>>>>>>>>>>>>>>>>>> regex. Does the regex allow to use /i to ignore cases ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>> daddy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people often have
>>>>>>>>>>>>>>>>>>> problems with getting file permissions right, and they do 
>>>>>>>>>>>>>>>>>>> not understand
>>>>>>>>>>>>>>>>>>> how to shut processes down cleanly, and zookeeper is 
>>>>>>>>>>>>>>>>>>> resilient against
>>>>>>>>>>>>>>>>>>> that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into memory so
>>>>>>>>>>>>>>>>>>> you do not need huge amounts of memory.  The default values 
>>>>>>>>>>>>>>>>>>> are more than
>>>>>>>>>>>>>>>>>>> enough for 35,000 files, which is a pretty small job for 
>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know how is
>>>>>>>>>>>>>>>>>>>> zookeeper different from file based sync? I also need a 
>>>>>>>>>>>>>>>>>>>> guidance on how to
>>>>>>>>>>>>>>>>>>>> manage my pc's memory. How many Go should I allocate for 
>>>>>>>>>>>>>>>>>>>> the start-agent of
>>>>>>>>>>>>>>>>>>>> ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and that's
>>>>>>>>>>>>>>>>>>>>> interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of file-based sync.
>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after that.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I have
>>>>>>>>>>>>>>>>>>>>>> looked into the ManifoldCF log file and extracted the 
>>>>>>>>>>>>>>>>>>>>>> following warnings :
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES 
>>>>>>>>>>>>>>>>>>>>>> (Lowercase)
>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be full.
>>>>>>>>>>>>>>>>>>>>>> Shutting down process; locks may be left dangling. You 
>>>>>>>>>>>>>>>>>>>>>> must cleanup before
>>>>>>>>>>>>>>>>>>>>>> restarting.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the elasticsearch
>>>>>>>>>>>>>>>>>>>>>> output connection. Moreover, the job uses Tika to 
>>>>>>>>>>>>>>>>>>>>>> extract metadata and a
>>>>>>>>>>>>>>>>>>>>>> file system as a repository connection. During the job, 
>>>>>>>>>>>>>>>>>>>>>> I don't extract the
>>>>>>>>>>>>>>>>>>>>>> content of the documents. I was wandering if the issue 
>>>>>>>>>>>>>>>>>>>>>> comes from
>>>>>>>>>>>>>>>>>>>>>> elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error that
>>>>>>>>>>>>>>>>>>>>>>> looks like it might go away on retry, but does not.  It 
>>>>>>>>>>>>>>>>>>>>>>> can be either on
>>>>>>>>>>>>>>>>>>>>>>> the repository side or on the output side.  If you look 
>>>>>>>>>>>>>>>>>>>>>>> at the Simple
>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log file, you 
>>>>>>>>>>>>>>>>>>>>>>> should be able to get
>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong.  Without further 
>>>>>>>>>>>>>>>>>>>>>>> information, I can't
>>>>>>>>>>>>>>>>>>>>>>> say any more.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> i93oth...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from société
>>>>>>>>>>>>>>>>>>>>>>>> générale in France. I'm actually using your recent 
>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF
>>>>>>>>>>>>>>>>>>>>>>>> 2.8 . I'm working on an internal search engine. For 
>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using
>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf in order to index documents on windows 
>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a
>>>>>>>>>>>>>>>>>>>>>>>> serious problem while crawling 35K documents. Most of 
>>>>>>>>>>>>>>>>>>>>>>>> the time, when
>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf start crawling a big sized documents (19Mo 
>>>>>>>>>>>>>>>>>>>>>>>> for example), it ends
>>>>>>>>>>>>>>>>>>>>>>>> the job with the following error: repeated service 
>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure
>>>>>>>>>>>>>>>>>>>>>>>> processing document : software caused connection 
>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve this
>>>>>>>>>>>>>>>>>>>>>>>> problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>

Reply via email to