Re: Question about ManifoldCF 2.8

2017-09-18 Thread Karl Wright
Unfortunately, I'll be offline all day today, starting in about 20 minutes, so I don't have enough time to clarify the manual enough to help you out here. The UI, though, is pretty self-explanatory I would think. Kalr On Mon, Sep 18, 2017 at 6:22 AM, Beelz Ryuzaki wrote:

Re: Question about ManifoldCF 2.8

2017-09-18 Thread Beelz Ryuzaki
Hi Karl, Thank you for this brief explanation, I will use adjuster eventually to tag the documents. Can you give an example of how to use the adjuster, say to tag documents that contains the word 'delivery'? I didn't understand well the example that was included in the documentation. Many

Re: Question about ManifoldCF 2.8

2017-09-18 Thread Beelz Ryuzaki
I saw the documentation about "Metadata Adjuster" and I believe it doesn't add a new field "Tag". I actually have some documents that have the word "delivery" on either their title or subtitle or content, and I want to tag them as "delivery" in elasticsearch. Is it possible to do it with

Re: Question about ManifoldCF 2.8

2017-09-18 Thread Karl Wright
Hi Othman, You can add an attribute to all documents from any specific MCF job or jobs by including a "Metadata Adjuster" in the pipeline for the job. Hope that answers your question? Karl On Mon, Sep 18, 2017 at 5:28 AM, Beelz Ryuzaki wrote: > Hello Karl, > > I'm

Re: Question about ManifoldCF 2.8

2017-09-18 Thread Beelz Ryuzaki
Hello Karl, I'm interested in knowing if there is a way to tag the indexed documents with ManifoldCF ? Many thanks, Othman BELHAJ On Fri, 8 Sep 2017 at 21:43, Karl Wright wrote: > Hi Othman, > > There are two properties files for zookeeper: the global properties, and >

Re: Question about ManifoldCF 2.8

2017-09-08 Thread Karl Wright
Hi Othman, There are two properties files for zookeeper: the global properties, and the local (zookeeper managed) properties. The database configuration is in the zookeeper managed properties. Please examine the following page for setting up Postgresql properties:

Re: Question about ManifoldCF 2.8

2017-09-08 Thread Beelz Ryuzaki
Sorry to bother you again, but what is the difference between indexable files and files in the path tab of a job ? Thanks, Othman BELHAJ On Fri, 8 Sep 2017 at 19:27, Beelz Ryuzaki wrote: > Hi Karl, > > My zookeeper is still pointing to the HSQL database. What should I do

Re: Question about ManifoldCF 2.8

2017-09-06 Thread Beelz Ryuzaki
Thank you, Karl. I will try to combine Postgresql with zookeeper and let you know. Othman. On Wed, 6 Sep 2017 at 13:18, Karl Wright wrote: > No, you can use whatever supported database you like. > > Karl > > > On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki

Re: Question about ManifoldCF 2.8

2017-09-06 Thread Karl Wright
No, you can use whatever supported database you like. Karl On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki wrote: > As far as I know, when you use zookeeper , you obligatory need to use > HSQLDB to go with it, right? > > Thanks, > Othman > > On Wed, 6 Sep 2017 at 12:56,

Re: Question about ManifoldCF 2.8

2017-09-06 Thread Beelz Ryuzaki
As far as I know, when you use zookeeper , you obligatory need to use HSQLDB to go with it, right? Thanks, Othman On Wed, 6 Sep 2017 at 12:56, Karl Wright wrote: > Hi Othman, > > HSQLDB stores all tables in memory so you need to size it accordingly. > That is one reason we

Re: Question about ManifoldCF 2.8

2017-09-06 Thread Beelz Ryuzaki
Hi Karl, I resolved the elasticsearch problem however the application doesn't seem to work after I have run a job to crawl over 500k documents. I get an GC overhead limit exceeded in the hsql database. How many should I allocate for it? Best regards, Othman On Tue, 5 Sep 2017 at 12:43, Karl

Re: Question about ManifoldCF 2.8

2017-09-05 Thread Karl Wright
Hi Othman, Thanks for doing the evaluation of the problem. Generally, the ManifoldCF project does not have the expertise to diagnose problems with external systems like Solr or Elasticsearch. So going to another newsgroup for those kinds of issues would be a good idea. Thanks! Karl On Tue,

Re: Question about ManifoldCF 2.8

2017-09-04 Thread Beelz Ryuzaki
Hi Karl, I'm sorry to bother on your holiday. I will try to analyze it today and let it you know what I have found. Enjoy your day ! Best regards, Othman BELHAJ. On Mon, 4 Sep 2017 at 16:06, Karl Wright wrote: > Hi Othman, > > I won't be able to look at this today; it is

Re: Question about ManifoldCF 2.8

2017-09-04 Thread Karl Wright
Hi Othman, I won't be able to look at this today; it is a holiday here. But, the "socket write" error is coming from ElasticSearch. If ES is configured to not accept documents greater than a certain size, that might explain it. Maybe the ES logs would help? I'm afraid you're going to need to

Re: Question about ManifoldCF 2.8

2017-09-01 Thread Karl Wright
(1) I would create a ticket for the "*word*" exclusion. It would be helpful to include a screen shot of the view page of your job as well. (2) I will be uploading a new ManifoldCF 2.8.1 RC shortly. Karl On Fri, Sep 1, 2017 at 12:05 PM, Beelz Ryuzaki wrote: > Hi Karl, >

Re: Question about ManifoldCF 2.8

2017-09-01 Thread Karl Wright
Hi Othman, I will respin a new 2.8.1 (RC1) to address the zookeeper issue. The failure you are seeing is "NoSuchMethodError". Therefore, the class is being found, but it is the *wrong* class. When you deployed the new release, did you deploy it in a new directory, or did you overwrite the

Re: Question about ManifoldCF 2.8

2017-09-01 Thread Karl Wright
Hi Othman, You do not need a new database instance. You can download MCF 2.8.1 RC0 from here: https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1 Karl On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki wrote: > Hi Karl, > > Thank you very much for your

Re: Question about ManifoldCF 2.8

2017-09-01 Thread Beelz Ryuzaki
Hi Karl, Thank you very much for your help, I'm going to try out the zookeeper example. Should I initialize a new database? And how can I run the zookeeper start-agent ? Othman. On Fri, 1 Sep 2017 at 11:37, Karl Wright wrote: > Hi Othman, > > These exceptions are now

Re: Question about ManifoldCF 2.8

2017-09-01 Thread Karl Wright
Hi Othman, These exceptions are now coming from file locking and are due to permissions problems. I suggest you go to Zookeeper for file locking. I am building a 2.8.1 release candidate. When it available for download, I'll send you the URL. Thanks, Karl On Fri, Sep 1, 2017 at 5:27 AM,

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
Hi Karl, By 'other place', do you mean the \lib repository? If that so, then I have already tried it and it didn't work. Othman. On Thu, 31 Aug 2017 at 18:07, Karl Wright wrote: > Hi Othman, > > I used the java dependency inspector to see what the issue is and it turns >

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
All the dependencies you mentioned have already been added in the options.env.win file in the multiprocess-file-example repository. On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki wrote: > Yes, I added it in the options.env.win file. Should it be the one in the >

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
These are the five jars that dependency analysis said should be needed: // both poi-ooxml and poi-ooxml-schemas Don't do any other jars than these, but DO make sure all four jars are moved. Thanks! Karl On Thu, Aug 31, 2017 at 11:30 AM,

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
Could it be a problem of elasticsearch's version ? I'm actually using 2.1.0 which is pretty old for this new version of ManifoldCF? Othman. On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki wrote: > I moved back both the jars you mentioned and a different is showing. You > will

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
I've looked at the dependencies; you should not have moved poi-3.15.jar. Please move that back, and commons-collections4-4.1.jar too. You *will* need to move curvesapi-1.04.jar though. Thanks, Karl On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright wrote: > If you include

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
If you include poi.jar, then all dependencies of poi.jar must also be included. This would mean that curvesapi-1.04.jar and commons-collections4-4.1.jar should also be included. Karl On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki wrote: > Hi Karl, > > I added the two

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
And concerning the path tabs, I will use the Unix/Windows wildcards. I think it will be enough. Othman. On Thu, 31 Aug 2017 at 16:23, Beelz Ryuzaki wrote: > Hi Karl, > > I added the two jars that you have mentioned and another one : > poi-3.15.jar . Unfortunately, there is

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
Hi Othman, Yes, this shows that the jar we moved calls back into another jar, which will also need to be moved. *That* jar has yet another dependency too. The list of jars is thus extended to include: poi-ooxml-3.15.jar dom4j-1.6.1.jar Karl On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
Once again, I need a stack trace to diagnose what the problem is. Thanks, Karl On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki wrote: > Oh, actually it didn't solve the problem. I looked into the log file and > saw the following error: > > Error tossed :

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
Oh, actually it didn't solve the problem. I looked into the log file and saw the following error: Error tossed : org/apache/poi/POIXMLTypeLoader java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader. Maybe another jar is missing ? Othman. On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
Ok, I will try it right away and let you know if it works. Othman. On Thu, 31 Aug 2017 at 14:15, Karl Wright wrote: > Oh, and you also may need to edit your options.env files to include them > in the classpath for startup. > > Karl > > > On Thu, Aug 31, 2017 at 7:53 AM,

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
Oh, and you also may need to edit your options.env files to include them in the classpath for startup. Karl On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright wrote: > If you are amenable, there is another workaround you could try. > Specifically: > > (1) Shut down all MCF

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
If you are amenable, there is another workaround you could try. Specifically: (1) Shut down all MCF processes. (2) Move the following two files from connector-common-lib to lib: xmlbeans-2.6.0.jar poi-ooxml-schemas-3.15.jar (3) Restart everything and see if your crawl resumes. Please let me

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
I created a ticket for this: CONNECTORS-1450. One simple workaround is to use the external Tika server transformer rather than the embedded Tika Extractor. I'm still looking into why the jar is not being found. Karl On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki wrote:

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
Hi Othman, The way you restrict documents with the windows share connector is by specifying information on the "Paths" tab in jobs that crawl windows shares. There is end-user documentation both online and distributed with all binary distributions that describe how to do this. Have you found

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Karl Wright
I need the complete stack trace please. Are you building ManifoldCF yourself, or are you using the distributed binary? Karl On Thu, Aug 31, 2017 at 5:48 AM, Beelz Ryuzaki wrote: > I have also encountered the following problem while indexing documents in > the windows

Re: Question about ManifoldCF 2.8

2017-08-31 Thread Beelz Ryuzaki
Hello Karl, Thank you for your response, I will start using zookeeper and I will let you know if it works. I have another question to ask. Actually, I need to make some filters while crawling. I don't want to crawl some files and some folders. Could you give me an example of how to use the regex.

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Steph van Schalkwyk
Thanks Karl.

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Furkan KAMACI
Hi Steph, Zookeeper is a coordination service for distributed systems. Having a quorum means that more than half of the number of nodes are up and running. This is for protection of brain splitting issue. Zookeeper is a distributed system and it may be down at any time. Brain splitting can be

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Karl Wright
Hi Steph, You can configure your zookeeper however you like; there is a sample configuration file included with MCF that works out of the box. But yes, we do recommend a quorum count of 3 or more. Karl On Wed, Aug 30, 2017 at 2:19 PM, Steph van Schalkwyk wrote: > Karl, > Is

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Steph van Schalkwyk
Karl, Is there a requirement for the number of ZK for MCF? I've used ZK with SOLR, and the minimum quorum count is 3. Thanks Steph

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Beelz Ryuzaki
I'm actually not using zookeeper. i want to know how is zookeeper different from file based sync? I also need a guidance on how to manage my pc's memory. How many Go should I allocate for the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ? Othman. On Wed, 30 Aug 2017 at

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Karl Wright
Your disk is not writable for some reason, and that's interfering with ManifoldCF 2.8 locking. I would suggest two things: (1) Use Zookeeper for sync instead of file-based sync. (2) Have a look if you still get failures after that. Thanks, Karl On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Beelz Ryuzaki
Hi Mr Karl, Thank you Mr Karl for your quick response. I have looked into the ManifoldCF log file and extracted the following warnings : - Attempt to set file lock 'D:\\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)

Re: Question about ManifoldCF 2.8

2017-08-30 Thread Karl Wright
Hi Othman, ManifoldCF aborts a job if there's an error that looks like it might go away on retry, but does not. It can be either on the repository side or on the output side. If you look at the Simple History in the UI, or at the manifoldcf.log file, you should be able to get a better sense of

Question about ManifoldCF 2.8

2017-08-30 Thread Beelz Ryuzaki
Hello, I'm Othman Belhaj, a software engineer from société générale in France. I'm actually using your recent version of manifoldCF 2.8 . I'm working on an internal search engine. For this reason, I'm using manifoldcf in order to index documents on windows shares. I encountered a serious problem