Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone, As part of The ASF’s 25th anniversary campaign[1], we will be celebrating projects and communities in multiple ways. We invite all projects and contributors to participate in the following ways: * Individuals - submit your first contribution:

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers! [ You are receiving this email as a subscriber to one or more ASF project dev or user mailing lists and is not being sent to you directly. It is important that we reach all of our users and contributors/committers so that they may get a chance

Jetty Config changes

2024-03-11 Thread ritika jain
Hi All, When Manifoldcf start with start.jar , it creates an entry in system's tmp folder ,but it does not automatically get cleaned when server/manifold stops On my Live server I am using dockerised environment , where we have a mechanism that restarts manifold whenever required , every time

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code Asia 2024 are now open! We will be supporting Community over Code Asia, Hangzhou, China July 26th - 28th, 2024. TAC exists

Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! We will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists

[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! We will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
No, only the seed URLs get updated with that option. On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo < marisol.redondo.gar...@gmail.com> wrote: > Thanks a lot for the explanation, Karl, really useful. > > I will wait for your reply at the end of the week, but I thought that the > main reason

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Thanks a lot for the explanation, Karl, really useful. I will wait for your reply at the end of the week, but I thought that the main reason for the option "Reset seeding" was for that, for reevaluating all pages, as a new fresh execution. On Tue, 26 Sept 2023 at 13:30, Karl Wright wrote: >

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
Okay, that is good to know. The hopcount assessment occurs when documents are added to the queue. Hopcounts are stored for each document in the hopcount table. So if you change a hopcount limit, it is quite possible that nothing will change unless documents that are at the previous hopcount limit

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
No, I haven't used this options, I have it configured as "Keep unreachable documents, for now", but it's also ignoring them because they were already kept?. With this option, when the unreachable document for now are converted to forever? The only solution I can think on is creating a new job

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
If you ever set "Ignore unreachable documents forever" for the job, you can't go back and stop ignoring them. The data that the job would need to have recorded for this is gone. The only way to get it back is if you can convince the ManifoldCF to recrawl all documents in the job. On Tue, Sep

Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Hi, I had a problem with document out of scope I change the Maximum hop count for type "redirect" in one of my job to 5, and saw that the job is not processing some pages because of that, so I removed the value to get them injecting into the output connector (Solr connector) After that, the same

R: web crawler https

2023-09-26 Thread Bisonti Mario
Thanks a lot Karl! I uploaded ssl certificate and flag on “always trust” and it works Mario Da: Karl Wright Inviato: lunedì 25 settembre 2023 20:41 A: user@manifoldcf.apache.org Oggetto: Re: web crawler https See this article:

Re: web crawler https

2023-09-25 Thread Karl Wright
See this article: https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty ManifoldCF web crawler configuration allows you to drop certs into a local trust store for the connection. You need to either do that (adding whatever certificate authority cert you

web crawler https

2023-09-25 Thread Bisonti Mario
Hi, I would like to try indexing a Wordpress internal site. I tried to configure Repository Web, Job with seeds but I always obtain: WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption reported for job 1695649924581 connection 'Wp': IO exception

Documentation issue?

2023-09-14 Thread Bisonti Mario
Hi, I would like to report that at the url: https://manifoldcf.apache.org/release/release-2.25/en_US/index.html I obtain: Not Found The requested URL was not found on this server. Thank you Mario

Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen
Hello! Registration is still open for the upcoming Community Over Code NA event in Halifax, NS! We invite you to register for the event https://communityovercode.org/registration/ Apache Committers, note that you have a special discounted rate for the conference at US$250. To take advantage of

Re: Duplicate key value violates unique constraint "repohistory_pkey"

2023-06-16 Thread Marisol Redondo
Hi, Did you find any solution for that or do you have still disabled the history? I'm having the same problem, and we are using postgresql as the db. Regards On Sun, 29 Jan 2023 at 05:48, Artem Abeleshev wrote: > Hi everyone! > > We are using ManifoldCF 2.22.1 with multiple nodes in our

TAC Applications for Community Over Code North America and Asia now open

2023-06-16 Thread Gavin McDonald
Hi All, (This email goes out to all our user and dev project mailing lists, so you may receive this email more than once.) The Travel Assistance Committee has opened up applications to help get people to the following events: *Community Over Code Asia 2023 - * *August 18th to August 20th in

Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
But if those are set, and the connection health check passes, then I can't tell you why Solr is unhappy with your connection. It's clearly working sometimes. I'd look on the Solr end to figure out whether its rejection is coming from just one of your instances. On Wed, Jun 7, 2023 at 7:49 AM

Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
The Solr output connection configuration contains all credentials that are sent to Solr. If those aren't set Solr won't get them. Karl On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo < marisol.redondo.gar...@gmail.com> wrote: > Hi, > > We are using Solr 8 with basic authentication, and when

Solr connector authentication issue

2023-06-07 Thread Marisol Redondo
Hi, We are using Solr 8 with basic authentication, and when checking the output connection I'm getting an Exception "Solr authorization failure, code 401: aborting job" The solr type is Solrcloud, as we have 3 server (installed in AWS Kubernette containers), I have set the user ID and password

R: Long Job on Windows Share

2023-06-07 Thread Bisonti Mario
In the manifoldcf.log I see many: WARN 2023-06-05T21:36:51,630 (Worker thread '31') - JCIFS: Possibly transient exception detected on attempt 2 while getting share security: All pipe instances are busy. jcifs.smb.SmbException: All pipe instances are busy. at

R: Long Job on Windows Share

2023-05-26 Thread Bisonti Mario
Thanks a lot Karl In the “Simple History” in ManifoldCF I see, for every document, even if it’s not been modified every day: 26/05/23, 08:47:47 document ingest (SolrShare) file:/...Avanzato%202014.pptx 26/05/23, 08:47:46 extract [TikaTrasform]

Re: Long Job on Windows Share

2023-05-25 Thread Karl Wright
The jcifs connector does not include a lot of information in the version string for a file - basically, the length, and the modified date. So I would not expect there to be lot of actual work involved if there are no changes to a document. The activity "access" does imply that the system

Long Job on Windows Share

2023-05-25 Thread Bisonti Mario
Hi, I would like to understand how recrawl works My job scan, using "Connection Type" "Windows shares" works for near 18 hours. My document numebr a little bit of 1 million. If I check the documents scan from MifoldCF I see, for example: [cid:image001.png@01D98F00.F3071580] It seems that re

Re: Apache Manifold Documentum connector

2023-03-17 Thread Rasťa Šíša
Thanks a lot for your kind and elaborate response! I will do some further investigation on my own towards the documentum. Best regards, Rasta pá 17. 3. 2023 v 12:08 odesílatel Karl Wright napsal: > It was open-sourced back in 2012 at the same time ManifoldCF was > open-sourced. It was written

Re: Apache Manifold Documentum connector

2023-03-17 Thread Karl Wright
It was open-sourced back in 2012 at the same time ManifoldCF was open-sourced. It was written by a contractor paid by MetaCarta, who also paid for the development of ManifoldCF itself (I developed that). It was spun off as open source when MetaCarta was bought by Nokia who had no interest in the

Re: Apache Manifold Documentum connector

2023-03-17 Thread Rasťa Šíša
Hi Karl, thanks for your answer! Would you be able to point me towards the author/git branch of the documentum connector? Best regards, Rasta čt 16. 3. 2023 v 20:58 odesílatel Karl Wright napsal: > Hi, > > I didn't write the documentum connector initially, so I trust that the > engineer who did

Re: Apache Manifold Documentum connector

2023-03-16 Thread Karl Wright
Hi, I didn't write the documentum connector initially, so I trust that the engineer who did knew how to construct the proper DQL. I've not seen any bugs related to it so it does seem to work. Karl On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša wrote: > Hello, > i would like to ask how does

Apache Manifold Documentum connector

2023-03-16 Thread Rasťa Šíša
Hello, i would like to ask how does Documentum Manifold connector select the latest version from Documentum system? The first query that gets composed collects list of i_chronicle_id in DCTM.java. I would like to know though, how does the Manifold recognize the latest version of the document(e.g.

Fwd: Defining an attribute that is an array in documentum connector

2023-03-14 Thread Rasťa Šíša
Hello, i tried to define a condition in manifoldCF UI where the documentum attribute is an array on the other side. applicable_sites = 'desired_value' But it keeps telling me Documentum error: [DM_QUERY_E_REPEATING_USED]error: "You have specified a repeating attribute (applicable_sites) where it

Database configuration

2023-02-17 Thread Macek, Radek via user
Hello, please advise how to set up manifoldcf to use a database that runs on another machine. Thx Radek This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (126 East Lincoln Ave., P.O. Box 2000, Rahway, NJ USA 07065) and/or its affiliates, that may be

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-09 Thread Bisonti Mario
Hi, could you give me any suggestion to solve my issue? I note that to index 1 million documents (office, pdf, ect) so I use Tika, it finishes after near 18 hours. My host is an Ubuntu server with 8cpu and 68GB RAM Thanks a lot Mario Da: Bisonti Mario Inviato: mercoledì 1 febbraio 2023

Performance problems

2023-02-05 Thread Artem Abeleshev
Hi everyone! I've struggling with the performance problem already for the couple of weeks. We have two environments: - `dev` with 2 nodes of ManifoldCF agent + 1 node of Zookeeper - `prod` with 4 nodes of ManifoldCF agent + 3 node of Zookeeper ManifoldCF agent settings are identical at the

Re: Job stucked with cleaning up status

2023-02-03 Thread Karl Wright
The shutdown procedure for ManifoldCF involves sending interruptions (or socket interruptions) to all worker threads. These then put the threads in the "terminated" state, one by one. So you should only get this if you shut down the agents process, or try to. The handling for this is correct,

Re: Job stucked with cleaning up status

2023-02-02 Thread Artem Abeleshev
Karl, good day! Thank you for the hint! It was very useful! Actually, you was right and the actual problem was about the connection. But I doesn't expect it would be so dramatic. Here is what I found using some debugging: First I have found the actual code that was responsible for the deletion

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
I don't understand. Would you like to explain me what "running with a profiler" mean, please? I start agent running a start-agents.sh script, and zookeeper too. /opt/manifoldcf/multiprocess-zk-example-proprietary/runzookeeper.sh

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Karl Wright
It looks like you are running with a profiler? That uses a lot of memory. Karl On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario wrote: > This is my hs_err_pid_.log > > > > Command Line: -Xms32768m -Xmx32768m > -Dorg.apache.manifoldcf.configfile=./properties.xml >

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
This is my hs_err_pid_.log Command Line: -Xms32768m -Xmx32768m -Dorg.apache.manifoldcf.configfile=./properties.xml -Djava.security.auth.login.con fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun . . . CodeHeap 'non-profiled nmethods': size=120032Kb

Re: Job stucked with cleaning up status

2023-01-29 Thread Karl Wright
Hi, 2.22 makes no changes to the way document deletions are processed over probably 10 previous versions of ManifoldCF. What likely is the case is that the connection to the output for the job you are cleaning up is down. When that happens, the documents are queued but the delete worker threads

Job stucked with cleaning up status

2023-01-29 Thread Artem Abeleshev
Hi, everyone! Another problem that I got sometimes. We are using ManifoldCF 2.22.1 with multiple nodes in our production. The creation of the MCF job pipeline is handled via the API calls from our service. We create jobs, repositories and output repositories. The crawler extracts documents and

Duplicate key value violates unique constraint "repohistory_pkey"

2023-01-28 Thread Artem Abeleshev
Hi everyone! We are using ManifoldCF 2.22.1 with multiple nodes in our production. And I am investigating the problem we've got recently (it happens at least 5-6 times already). Couple of our jobs are end up with the following error: ``` Error: ERROR: duplicate key value violates unique

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-20 Thread Bisonti Mario
I see that the agent crashed: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470 # fatal error: Overflow during reference processing, can not continue. Please increase MarkStackSizeMax (current value:

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
When you get a hang like this, getting a thread dump of the agents process is essential to figure out what the issue is. You can't assume that a transient error would block anything because that's not how ManifoldCF works, at all. Errors push the document in question back onto the queue with a

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi Karl. But I noted that the job was hanging, the document processed was stucked on the same number, no further document processing from the 6 a.m until I restart Agent Da: Karl Wright Inviato: mercoledì 18 gennaio 2023 12:10 A: user@manifoldcf.apache.org Oggetto: Re: JCIFS: Possibly

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
Hi, "Possibly transient issue" means that the error will be retried anyway, according to a schedule. There should not need to be any requirement to shut down the agents process and restart. Karl On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario wrote: > Hi. > > Often, I obtain the error: > > WARN

JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi. Often, I obtain the error: WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy. jcifs.smb.SmbException: All pipe instances are busy. at

Re: Help for subscribing the user mailing list of MCF

2023-01-10 Thread Koji Sekiguchi
Hi Karl, I agree. BTW, Artem, the colleague, finally succeeded to subscribe. He tried to subscribe some more times before opening JIRA ticket in INFRA, and he finally got some responses from the ML system. Maybe they restarted the system or did something else. Thanks! Koji 2023年1月10日(火) 20:17

Re: Help for subscribing the user mailing list of MCF

2023-01-10 Thread Karl Wright
Hmm - I haven't heard of difficulties like this before. The mail manager is used apache-wide; if it doesn't work the best thing to do would be to create an infra ticket in JIRA. Karl On Tue, Jan 10, 2023 at 3:50 AM Koji Sekiguchi wrote: > Hi Karl, everyone! > > I'm writing to the moderator

Help for subscribing the user mailing list of MCF

2023-01-10 Thread Koji Sekiguchi
Hi Karl, everyone! I'm writing to the moderator of the MCF mailing list. I'd like you to help my colleague to subscribe to MCF user mailing list. He's tried to subscribe several times by sending the request to user-subscr...@manifoldcf.apache.org but he said that it seemed that they were just

Re: Is Manifold capable of handling these kind of files

2022-12-23 Thread Karl Wright
The internals of ManifoldCF will handle this fine if you are sure to set the encoding of your database to be UTF-8. However, I don't know about the JCIFS library, and whether there might be a restriction on characters in that code base. I think you'd have to just try it and see, frankly. Karl

Is Manifold capable of handling these kind of files

2022-12-23 Thread Priya Arora
Hi Is Manifold capable of handling this kind (ingesting) of file in window shares connector which has special characters like these demo/11208500/11208550/I. Proposal/PHASE II/220808 Input/__MACOSX/虎尾/._62A33A6377CF08B472CC2AB562BD8B5D.JPG Any reply would be appreciated

Re: Manifoldcf -XML parsing error: Character reference "" is an invalid XML character.

2022-12-22 Thread ritika jain
Can anybody provide any clue on this. Would be of great help On Thu, Dec 22, 2022 at 5:33 PM ritika jain wrote: > Hi all, > > I am using Manifoldcf 2.21 version with Windows shares connector and > Output as Elastic. > I am facing this error while clicking "List all jobs", Manifoldcf, jobs >

Manifoldcf -XML parsing error: Character reference "" is an invalid XML character.

2022-12-22 Thread ritika jain
Hi all, I am using Manifoldcf 2.21 version with Windows shares connector and Output as Elastic. I am facing this error while clicking "List all jobs", Manifoldcf, jobs are being run/create in such a way that our API is creating a manifold job object and thus creating/starting a job in manifold

CVE-2022-45910: Apache ManifoldCF: LDAP Injection Vulnerability - ActiveDirectory Authorities

2022-12-06 Thread Markus Schuch
Description: Improper neutralization of special elements used in an LDAP query ('LDAP Injection') vulnerability in ActiveDirectory and Sharepoint ActiveDirectory authority connectors of Apache ManifoldCF allows an attacker to manipulate the LDAP search queries (DoS, additional queries, filter

Re: Unscribe

2022-10-22 Thread Muhammed Olgun
Hi Ronny, Unsubscribing is self-service. Please follow here, https://manifoldcf.apache.org/en_US/mail.html On 22 Oct 2022 Sat at 08:55 Ronny Heylen wrote: > Hi, > Please unscribe me from these emails, I don't work anymore. > > Regards, > > Ronny >

Unscribe

2022-10-21 Thread Ronny Heylen
Hi, Please unscribe me from these emails, I don't work anymore. Regards, Ronny

Re: Frequent error while window shares job

2022-08-22 Thread Karl Wright
You will need to contact the current maintainers of the Jcifs library to get answers to these questions. Karl On Mon, Aug 22, 2022 at 3:27 AM ritika jain wrote: > Hi All, > > I have a Windows shared job to crawl files from samba server, it's a huge > job to crawl documents in millions(about

Frequent error while window shares job

2022-08-22 Thread ritika jain
Hi All, I have a Windows shared job to crawl files from samba server, it's a huge job to crawl documents in millions(about 10). While running a job , we encounter two types of errors very frequently. 1) WARN 2022-08-19T17:17:05,175 (Worker thread '7') - JCIFS: Possibly transient exception

[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022

2022-06-27 Thread Gavin McDonald
To all committers and non-committers. This is a final call to apply for travel/hotel assistance to get to and stay in New Orleans for ApacheCon 2022. Applications have been extended by one week and so the application deadline is now the 8th July 2022. The rest of this email is a copy of what

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-14 Thread Karl Wright
Remember, there is already a "forget" button on the output connection, which will remove everything associated with the connection. It's meant to be used when the output index has been reset and is empty. I'm not sure what you'd do different functionally. Karl On Tue, Jun 14, 2022 at 2:04 AM

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-14 Thread Koji Sekiguchi
+1. I respect for the design concept of ManifoldCF, but I think force delete options make MCF more useful for those who use MCF as crawler. Adding force delete options doesn't change default behaviors and it doesn't break back-compatibility. Koji On 2022/06/14 14:46, Ricardo Ruiz wrote: Hi

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-13 Thread Ricardo Ruiz
Hi Karl We are using ManifoldCF as a crawler more than a synchronizer. We are thinking of contributing to ManifoldCf by including a force job delete and force output connector delete, considering of course the things that need to be deleted with them (BD, etc). Do you think this is possible? We

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-13 Thread Karl Wright
Because ManifoldCF is not just a crawler, but a synchonizer, a job represents and includes a list of documents that have been indexed. Deleting the job requires deleting the documents that have been indexed also. It's part of the basic model. So if you tear down your target output instance and

Can't delete a job when solr output connection can't connect to the instance.

2022-06-12 Thread Ricardo Ruiz
Hi all My team uses mcf to crawl documents and index into solr instances, but for reasons beyond our control, sometimes the instances or collections are deleted. When we try to delete a job and the solr instance or collection doesn't exist anymore, the job reaches the "End notification" status and

Final reminder: ApacheCon North America call for presentations closing soon

2022-05-19 Thread Rich Bowen
[Note: You're receiving this because you are subscribed to one or more Apache Software Foundation project mailing lists.] This is your final reminder that the Call for Presetations for ApacheCon North America 2022 will close at 00:01 GMT on Monday, May 23rd, 2022. Please don't wait! Get your talk

REMINDER - Travel Assistance available for ApacheCon NA New Orleans 2022

2022-05-03 Thread Gavin McDonald
Hi All Contributors and Committers, This is a first reminder email that travel assistance applications for ApacheCon NA 2022 are now open! We will be supporting ApacheCon North America in New Orleans, Louisiana, on October 3rd through 6th, 2022. TAC exists to help those that would like to

Re: Job Service Interruption- and stops

2022-04-29 Thread Karl Wright
" repeated service interruption" means that it happens again and again. For this particular document, the problem is that the error we are seeing is: "The process cannot access the file because it is being used by another process." ManifoldCF assumes that if it retries enough it should be able

Job Service Interruption- and stops

2022-04-29 Thread ritika jain
Hi All, With the window shares connector, on the server I am getting this exception and due to repeated service interruption *job stops.* Error: Repeated service interruptions - failure processing document: The process cannot access the file because it is being used by another process. How we

Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Rich Bowen
[You are receiving this because you are subscribed to one or more user or dev mailing list of an Apache Software Foundation project.] ApacheCon draws participants at all levels to explore “Tomorrow’s Technology Today” across 300+ Apache projects and their diverse communities. ApacheCon showcases

Re: Log4j Update Doubt

2022-03-15 Thread Karl Wright
We cannot do back patches of older versions of ManifoldCF. There is a new release which shipped in January that addresses log4j issues. I suggest updating to that. Karl On Tue, Mar 15, 2022 at 8:59 AM ritika jain wrote: > Hi, > > How manifoldcf uses log4j files in bin

Log4j Update Doubt

2022-03-15 Thread ritika jain
Hi, How manifoldcf uses log4j files in bin directory/distribution. If this is the location "D:\\Manifoldcf\apache-manifoldcf-2.14\lib" that is the lib folder only.(for physical file presence) Also if the log4j dependency issue has been resolved and the version 2.15 or higher is updated, then

Re: Manifoldcf freezes and sit idle

2022-01-31 Thread Karl Wright
As I've mentioned before, the best way to diagnose problems like this is to get a thread dump of the agents process. There are many potential reasons it could occur, ranging from stuck locks to resource starvation. What locking model are you using? Karl On Mon, Jan 31, 2022 at 6:02 AM ritika

Manifoldcf freezes and sit idle

2022-01-31 Thread ritika jain
Hi, I am using Manifoldcf 2.14, web connector and Elastic as output. I have observed after a certain time period of continuous run job freezes and does not do/process anything. Simple history shows nothing after a certain process, and it's not for one job it has been observed for 3 different jobs

Re: Log4j dependency

2021-12-14 Thread Karl Wright
ManifoldCF framework and connectors use log4j 2.x to dump information to the ManifoldCF log file. Please read the following page: https://logging.apache.org/log4j/2.x/security.html Specifically, this part: 'Descripton: Apache Log4j2 <=2.14.1 JNDI features used in configuration, log messages,

Re: Log4j dependency

2021-12-14 Thread Furkan KAMACI
Hi Ritika, For maven check here: https://github.com/apache/manifoldcf/blob/trunk/pom.xml#L80 For Ant check here: https://github.com/apache/manifoldcf/blob/trunk/build.xml#L87 Kind Regards, Furkan KAMACI On Tue, Dec 14, 2021 at 12:41 PM ritika jain wrote: > .Hi All, > > How does manifold.cf

Log4j dependency

2021-12-14 Thread ritika jain
.Hi All, How does manifold.cf use log4j. When I checked pom.xml of ES connector , it is shown as an *exclusion *of maven dependency. [image: image.png] But when checked in Project's downloaded Dependencies, It shows it being used and downloaded. [image: image.png] How does manifold use log 4j

Two profiles of manifoldcf

2021-12-03 Thread ritika jain
Hi All, Can we create two different username/password of crawler UI of manifoldcf. I tried configuring two user profiles in properties.xml, but it's not working. Is there a way to do that? Thanks Ritika

Re: Manifoldcf background process

2021-11-18 Thread Karl Wright
The degree of parallelism can be controlled in two ways. The first way is to set the number of worker threads to something reasonable. Usually, this is no more than about 2x the number of processors you have. The second way is to control the number of connections in your jcifs connector to keep

Manifoldcf background process

2021-11-17 Thread ritika jain
Hi All, I would like to understand the background process of Manifoldcf windows shares jobs , and how it processes the path mentioned in the jobs configuration. I am creating a dynamic job via API using PHP which will pick up approx 70k of documents and a dynamic job with 70k of different paths

Re: Manifold Job process isssue

2021-11-15 Thread Karl Wright
SMB exceptions with jcifs in the trace tell us that JCIFS couldn't talk to your windows share server. That's all we can tell though. Karl On Mon, Nov 15, 2021 at 7:24 AM ritika jain wrote: > Hi, > > Raising the concern above again, to process only 60k of document (when > clock issue is fixed

Re: Manifold Job process isssue

2021-11-15 Thread ritika jain
Hi, Raising the concern above again, to process only 60k of document (when clock issue is fixed too), job process is not progressing , its being stuck for like days. So had to restart the docker container every time for it to process. This time now we are getting this :- Timeout Exception. What

Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
One hour is quite a lot and will wreak havoc on the document queue. Karl On Tue, Nov 9, 2021 at 7:08 AM ritika jain wrote: > I have checked, there is only one hour time difference between docker > container and docker host > > On Tue, Nov 9, 2021 at 4:41 PM Karl Wright wrote: > >> If your

Re: Manifold Job process isssue

2021-11-09 Thread ritika jain
I have checked, there is only one hour time difference between docker container and docker host On Tue, Nov 9, 2021 at 4:41 PM Karl Wright wrote: > If your docker image's clock is out of sync badly with the real world, > then System.currentTimeMillis() may give bogus values, and ManifoldCF uses

Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
If your docker image's clock is out of sync badly with the real world, then System.currentTimeMillis() may give bogus values, and ManifoldCF uses that to manage throttling etc. I don't know if that is the correct explanation but it's the only thing I can think of. Karl On Tue, Nov 9, 2021 at

Manifold Job process isssue

2021-11-09 Thread ritika jain
Hi All, I am using window shares connector , manifoldcf 2.14 and ES as output. I have configured a job to process 60k of documents, Also these documents are new and do not have corresponding values in DB and ES index. So ideally it should process/Index the documents as soon as the job starts.

Re: Duplicate key error

2021-10-27 Thread Karl Wright
We see errors like this only because MCF is a highly multithreaded application, and two threads sometimes are able to collide in what they are doing even though they are transactionally separated. That is because of bugs in the database software. So if you restart the job it should not encounter

Re: Duplicate key error

2021-10-27 Thread Karl Wright
Is it repeatable? My guess is it is not repeatable. Karl On Wed, Oct 27, 2021 at 4:43 AM ritika jain wrote: > So , it can be left as it is.. ? because it is preventing job to complete > and its stopping. > > On Tue, Oct 26, 2021 at 8:40 PM Karl Wright wrote: > >> That's a database bug. All

Re: Duplicate key error

2021-10-27 Thread ritika jain
So , it can be left as it is.. ? because it is preventing job to complete and its stopping. On Tue, Oct 26, 2021 at 8:40 PM Karl Wright wrote: > That's a database bug. All of our underlying databases have some bugs of > this kind. > > Karl > > > On Tue, Oct 26, 2021 at 9:17 AM ritika jain >

Re:

2021-10-26 Thread Karl Wright
That's a database bug. All of our underlying databases have some bugs of this kind. Karl On Tue, Oct 26, 2021 at 9:17 AM ritika jain wrote: > Hi All, > > While using Manifoldcf 2.14 with Web connector and ES connector. After a > certain time of continuing the job (jobs ingest some documents

[no subject]

2021-10-26 Thread ritika jain
Hi All, While using Manifoldcf 2.14 with Web connector and ES connector. After a certain time of continuing the job (jobs ingest some documents in lakhs), we got this error on PROD. Can anybody suggest what could be the problem? PRODUCTION MANIFOLD ERROR: Error: ERROR: duplicate key value

Re: Windows Shares job-Limit on defining no of paths

2021-10-25 Thread Karl Wright
The only limit is that the more you add, the slower it gets. Karl On Mon, Oct 25, 2021 at 6:06 AM ritika jain wrote: > Hi , > Is there any limit on the number of paths we can define in job using > Repository as Window Shares and ES as Output > > Thanks >

Re: Null Pointer Exception

2021-10-25 Thread Karl Wright
The API should really catch this situation. Basically, you are calling a function that requires an input but you are not providing one. In that case the API sets the input to "null", and the detailed operation is called. The detailed operation is not expecting a null input. This is API piece

Windows Shares job-Limit on defining no of paths

2021-10-25 Thread ritika jain
Hi , Is there any limit on the number of paths we can define in job using Repository as Window Shares and ES as Output Thanks

Null Pointer Exception

2021-10-25 Thread ritika jain
Hi, I am getting Null pointer exceptions while creating a job programmatic approach via PHP. Can anybody suggest the reason for this?. Error 500 Server Error HTTP ERROR 500 Problem accessing /mcf-api-service/json/jobs. Reason: Server ErrorCaused by:java.lang.NullPointerException at

Re: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Karl Wright
Hi, You say this is a "Tika error". Is this Tika as a stand-alone service? I do not recognize any ManifoldCF classes whatsoever in this thread dump. If this is Tika, I suggest contacting the Tika team. Karl On Thu, Sep 30, 2021 at 3:02 AM Bisonti Mario wrote: > Additional info. > > > > I

R: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Bisonti Mario
Additional info. I am using 2.17-dev version Da: Bisonti Mario Inviato: martedì 28 settembre 2021 17:01 A: user@manifoldcf.apache.org Oggetto: Error: Repeated service interruptions - failure processing document: Read timed out Hello I have error on a Job that parses a network folder. This

Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-28 Thread Bisonti Mario
Hello I have error on a Job that parses a network folder. This is the tika error: 2021-09-28 16:14:50 INFO Server:415 - Started @1367ms 2021-09-28 16:14:50 WARN ContextHandler:1671 - Empty contextPath 2021-09-28 16:14:50 INFO ContextHandler:916 - Started

ApacheCon starts tomorrow!

2021-09-20 Thread Rich Bowen
ApacheCon @Home starts tomorrow! Details at https://www.apachecon.com/acah2021/index.html (Note: You're receiving this because you are subscribed to one or more user lists for Apache Software Foundation projects.) We've got three days of great content lined up for you, spanning 14 project

  1   2   3   4   5   6   7   8   9   10   >