Unfortunately, I'll be offline all day today, starting in about 20 minutes,
so I don't have enough time to clarify the manual enough to help you out
here. The UI, though, is pretty self-explanatory I would think.
Kalr
On Mon, Sep 18, 2017 at 6:22 AM, Beelz Ryuzaki wrote:
Hi Karl,
Thank you for this brief explanation, I will use adjuster eventually to tag
the documents. Can you give an example of how to use the adjuster, say to
tag documents that contains the word 'delivery'? I didn't understand well
the example that was included in the documentation.
Many
I saw the documentation about "Metadata Adjuster" and I believe it doesn't
add a new field "Tag". I actually have some documents that have the word
"delivery" on either their title or subtitle or content, and I want to tag
them as "delivery" in elasticsearch. Is it possible to do it with
Hi Othman,
You can add an attribute to all documents from any specific MCF job or jobs
by including a "Metadata Adjuster" in the pipeline for the job. Hope that
answers your question?
Karl
On Mon, Sep 18, 2017 at 5:28 AM, Beelz Ryuzaki wrote:
> Hello Karl,
>
> I'm
Hello Karl,
I'm interested in knowing if there is a way to tag the indexed documents
with ManifoldCF ?
Many thanks,
Othman BELHAJ
On Fri, 8 Sep 2017 at 21:43, Karl Wright wrote:
> Hi Othman,
>
> There are two properties files for zookeeper: the global properties, and
>
Hi Othman,
There are two properties files for zookeeper: the global properties, and
the local (zookeeper managed) properties. The database configuration is in
the zookeeper managed properties.
Please examine the following page for setting up Postgresql properties:
Sorry to bother you again, but what is the difference between indexable
files and files in the path tab of a job ?
Thanks,
Othman BELHAJ
On Fri, 8 Sep 2017 at 19:27, Beelz Ryuzaki wrote:
> Hi Karl,
>
> My zookeeper is still pointing to the HSQL database. What should I do
Thank you, Karl. I will try to combine Postgresql with zookeeper and let
you know.
Othman.
On Wed, 6 Sep 2017 at 13:18, Karl Wright wrote:
> No, you can use whatever supported database you like.
>
> Karl
>
>
> On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki
No, you can use whatever supported database you like.
Karl
On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki wrote:
> As far as I know, when you use zookeeper , you obligatory need to use
> HSQLDB to go with it, right?
>
> Thanks,
> Othman
>
> On Wed, 6 Sep 2017 at 12:56,
As far as I know, when you use zookeeper , you obligatory need to use
HSQLDB to go with it, right?
Thanks,
Othman
On Wed, 6 Sep 2017 at 12:56, Karl Wright wrote:
> Hi Othman,
>
> HSQLDB stores all tables in memory so you need to size it accordingly.
> That is one reason we
Hi Karl,
I resolved the elasticsearch problem however the application doesn't seem
to work after I have run a job to crawl over 500k documents. I get an GC
overhead limit exceeded in the hsql database. How many should I allocate
for it?
Best regards,
Othman
On Tue, 5 Sep 2017 at 12:43, Karl
Hi Othman,
Thanks for doing the evaluation of the problem.
Generally, the ManifoldCF project does not have the expertise to diagnose
problems with external systems like Solr or Elasticsearch. So going to
another newsgroup for those kinds of issues would be a good idea.
Thanks!
Karl
On Tue,
Hi Karl,
I'm sorry to bother on your holiday. I will try to analyze it today and let
it you know what I have found. Enjoy your day !
Best regards,
Othman BELHAJ.
On Mon, 4 Sep 2017 at 16:06, Karl Wright wrote:
> Hi Othman,
>
> I won't be able to look at this today; it is
Hi Othman,
I won't be able to look at this today; it is a holiday here. But, the
"socket write" error is coming from ElasticSearch. If ES is configured to
not accept documents greater than a certain size, that might explain it.
Maybe the ES logs would help?
I'm afraid you're going to need to
(1) I would create a ticket for the "*word*" exclusion. It would be
helpful to include a screen shot of the view page of your job as well.
(2) I will be uploading a new ManifoldCF 2.8.1 RC shortly.
Karl
On Fri, Sep 1, 2017 at 12:05 PM, Beelz Ryuzaki wrote:
> Hi Karl,
>
Hi Othman,
I will respin a new 2.8.1 (RC1) to address the zookeeper issue.
The failure you are seeing is "NoSuchMethodError". Therefore, the class is
being found, but it is the *wrong* class. When you deployed the new
release, did you deploy it in a new directory, or did you overwrite the
Hi Othman,
You do not need a new database instance.
You can download MCF 2.8.1 RC0 from here:
https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
Karl
On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki wrote:
> Hi Karl,
>
> Thank you very much for your
Hi Karl,
Thank you very much for your help, I'm going to try out the zookeeper
example. Should I initialize a new database? And how can I run the
zookeeper start-agent ?
Othman.
On Fri, 1 Sep 2017 at 11:37, Karl Wright wrote:
> Hi Othman,
>
> These exceptions are now
Hi Othman,
These exceptions are now coming from file locking and are due to
permissions problems. I suggest you go to Zookeeper for file locking.
I am building a 2.8.1 release candidate. When it available for download,
I'll send you the URL.
Thanks,
Karl
On Fri, Sep 1, 2017 at 5:27 AM,
Hi Karl,
By 'other place', do you mean the \lib repository? If that so, then I have
already tried it and it didn't work.
Othman.
On Thu, 31 Aug 2017 at 18:07, Karl Wright wrote:
> Hi Othman,
>
> I used the java dependency inspector to see what the issue is and it turns
>
All the dependencies you mentioned have already been added in the
options.env.win file in the multiprocess-file-example repository.
On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki wrote:
> Yes, I added it in the options.env.win file. Should it be the one in the
>
These are the five jars that dependency analysis said should be needed:
// both poi-ooxml and
poi-ooxml-schemas
Don't do any other jars than these, but DO make sure all four jars are
moved.
Thanks!
Karl
On Thu, Aug 31, 2017 at 11:30 AM,
Could it be a problem of elasticsearch's version ? I'm actually using 2.1.0
which is pretty old for this new version of ManifoldCF?
Othman.
On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki wrote:
> I moved back both the jars you mentioned and a different is showing. You
> will
I've looked at the dependencies; you should not have moved poi-3.15.jar.
Please move that back, and commons-collections4-4.1.jar too.
You *will* need to move curvesapi-1.04.jar though.
Thanks,
Karl
On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright wrote:
> If you include
If you include poi.jar, then all dependencies of poi.jar must also be
included. This would mean that curvesapi-1.04.jar and
commons-collections4-4.1.jar should also be included.
Karl
On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki wrote:
> Hi Karl,
>
> I added the two
And concerning the path tabs, I will use the Unix/Windows wildcards. I
think it will be enough.
Othman.
On Thu, 31 Aug 2017 at 16:23, Beelz Ryuzaki wrote:
> Hi Karl,
>
> I added the two jars that you have mentioned and another one :
> poi-3.15.jar . Unfortunately, there is
Hi Othman,
Yes, this shows that the jar we moved calls back into another jar, which
will also need to be moved. *That* jar has yet another dependency too.
The list of jars is thus extended to include:
poi-ooxml-3.15.jar
dom4j-1.6.1.jar
Karl
On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki
Once again, I need a stack trace to diagnose what the problem is.
Thanks,
Karl
On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki wrote:
> Oh, actually it didn't solve the problem. I looked into the log file and
> saw the following error:
>
> Error tossed :
Oh, actually it didn't solve the problem. I looked into the log file and
saw the following error:
Error tossed : org/apache/poi/POIXMLTypeLoader
java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader.
Maybe another jar is missing ?
Othman.
On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki
Ok, I will try it right away and let you know if it works.
Othman.
On Thu, 31 Aug 2017 at 14:15, Karl Wright wrote:
> Oh, and you also may need to edit your options.env files to include them
> in the classpath for startup.
>
> Karl
>
>
> On Thu, Aug 31, 2017 at 7:53 AM,
Oh, and you also may need to edit your options.env files to include them in
the classpath for startup.
Karl
On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright wrote:
> If you are amenable, there is another workaround you could try.
> Specifically:
>
> (1) Shut down all MCF
If you are amenable, there is another workaround you could try.
Specifically:
(1) Shut down all MCF processes.
(2) Move the following two files from connector-common-lib to lib:
xmlbeans-2.6.0.jar
poi-ooxml-schemas-3.15.jar
(3) Restart everything and see if your crawl resumes.
Please let me
I created a ticket for this: CONNECTORS-1450.
One simple workaround is to use the external Tika server transformer rather
than the embedded Tika Extractor. I'm still looking into why the jar is
not being found.
Karl
On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki wrote:
Hi Othman,
The way you restrict documents with the windows share connector is by
specifying information on the "Paths" tab in jobs that crawl windows
shares. There is end-user documentation both online and distributed with
all binary distributions that describe how to do this. Have you found
I need the complete stack trace please.
Are you building ManifoldCF yourself, or are you using the distributed
binary?
Karl
On Thu, Aug 31, 2017 at 5:48 AM, Beelz Ryuzaki wrote:
> I have also encountered the following problem while indexing documents in
> the windows
Hello Karl,
Thank you for your response, I will start using zookeeper and I will let
you know if it works. I have another question to ask. Actually, I need to
make some filters while crawling. I don't want to crawl some files and some
folders. Could you give me an example of how to use the regex.
Thanks Karl.
Hi Steph,
Zookeeper is a coordination service for distributed systems. Having a
quorum means that more than half of the number of nodes are up and running.
This is for protection of brain splitting issue. Zookeeper is a distributed
system and it may be down at any time.
Brain splitting can be
Hi Steph,
You can configure your zookeeper however you like; there is a sample
configuration file included with MCF that works out of the box. But yes,
we do recommend a quorum count of 3 or more.
Karl
On Wed, Aug 30, 2017 at 2:19 PM, Steph van Schalkwyk
wrote:
> Karl,
> Is
Karl,
Is there a requirement for the number of ZK for MCF? I've used ZK with
SOLR, and the minimum quorum count is 3.
Thanks
Steph
I'm actually not using zookeeper. i want to know how is zookeeper different
from file based sync? I also need a guidance on how to manage my pc's
memory. How many Go should I allocate for the start-agent of ManifoldCF? Is
4Go enough in order to crawler 35K files ?
Othman.
On Wed, 30 Aug 2017 at
Your disk is not writable for some reason, and that's interfering with
ManifoldCF 2.8 locking.
I would suggest two things:
(1) Use Zookeeper for sync instead of file-based sync.
(2) Have a look if you still get failures after that.
Thanks,
Karl
On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki
Hi Mr Karl,
Thank you Mr Karl for your quick response. I have looked into the
ManifoldCF log file and extracted the following warnings :
- Attempt to set file lock
'D:\\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
Hi Othman,
ManifoldCF aborts a job if there's an error that looks like it might go
away on retry, but does not. It can be either on the repository side or on
the output side. If you look at the Simple History in the UI, or at the
manifoldcf.log file, you should be able to get a better sense of
Hello,
I'm Othman Belhaj, a software engineer from société générale in France. I'm
actually using your recent version of manifoldCF 2.8 . I'm working on an
internal search engine. For this reason, I'm using manifoldcf in order to
index documents on windows shares. I encountered a serious problem
45 matches
Mail list logo