Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
e > main reason for the option "Reset seeding" was for that, for reevaluating > all pages, as a new fresh execution. > > > On Tue, 26 Sept 2023 at 13:30, Karl Wright wrote: > >> Okay, that is good to know. >> The hopcount assessment occurs when documents a

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
can think on is creating a new job with the exact same > characteristics and run it. > > Regards and thanks >Marisol > > > > On Tue, 26 Sept 2023 at 12:35, Karl Wright wrote: > >> If you ever set "Ignore unreachable documents forever" for the job, you >>

Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
If you ever set "Ignore unreachable documents forever" for the job, you can't go back and stop ignoring them. The data that the job would need to have recorded for this is gone. The only way to get it back is if you can convince the ManifoldCF to recrawl all documents in the job. On Tue, Sep

Re: web crawler https

2023-09-25 Thread Karl Wright
See this article: https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty ManifoldCF web crawler configuration allows you to drop certs into a local trust store for the connection. You need to either do that (adding whatever certificate authority cert you

Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
Karl Wright wrote: > The Solr output connection configuration contains all credentials that are > sent to Solr. If those aren't set Solr won't get them. > > Karl > > > On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo < > marisol.redondo.gar...@gmail.com> wrote: >

Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
The Solr output connection configuration contains all credentials that are sent to Solr. If those aren't set Solr won't get them. Karl On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo < marisol.redondo.gar...@gmail.com> wrote: > Hi, > > We are using Solr 8 with basic authentication, and when

Re: Long Job on Windows Share

2023-05-25 Thread Karl Wright
The jcifs connector does not include a lot of information in the version string for a file - basically, the length, and the modified date. So I would not expect there to be lot of actual work involved if there are no changes to a document. The activity "access" does imply that the system

Re: Apache Manifold Documentum connector

2023-03-17 Thread Karl Wright
at 5:41 AM Rasťa Šíša wrote: > Hi Karl, thanks for your answer! Would you be able to point me towards the > author/git branch of the documentum connector? > Best regards, Rasta > > čt 16. 3. 2023 v 20:58 odesílatel Karl Wright napsal: > >> Hi, >> >> I d

Re: Apache Manifold Documentum connector

2023-03-16 Thread Karl Wright
Hi, I didn't write the documentum connector initially, so I trust that the engineer who did knew how to construct the proper DQL. I've not seen any bugs related to it so it does seem to work. Karl On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša wrote: > Hello, > i would like to ask how does

Re: Job stucked with cleaning up status

2023-02-03 Thread Karl Wright
break; > ... > } > ... > } > } > catch (Throwable e) > { > ... > } > } > ``` > > It just breaks the loop making thread terminates normally! In a quite a > short time I always ends up with no `Do

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Karl Wright
gt; 0x7f051c50a000,0x7f051c60a000] [id=2537470] > > > > Stack: [0x7f051c50a000,0x00007f051c60a000], sp=0x7f051c608080, > free space=1016k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > >

Re: Job stucked with cleaning up status

2023-01-29 Thread Karl Wright
Hi, 2.22 makes no changes to the way document deletions are processed over probably 10 previous versions of ManifoldCF. What likely is the case is that the connection to the output for the job you are cleaning up is down. When that happens, the documents are queued but the delete worker threads

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
with a retry time. Karl On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario wrote: > Hi Karl. > > But I noted that the job was hanging, the document processed was stucked > on the same number, no further document processing from the 6 a.m until I > restart Agent > > > > &g

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
Hi, "Possibly transient issue" means that the error will be retried anyway, according to a schedule. There should not need to be any requirement to shut down the agents process and restart. Karl On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario wrote: > Hi. > > Often, I obtain the error: > > WARN

Re: Help for subscribing the user mailing list of MCF

2023-01-10 Thread Karl Wright
Hmm - I haven't heard of difficulties like this before. The mail manager is used apache-wide; if it doesn't work the best thing to do would be to create an infra ticket in JIRA. Karl On Tue, Jan 10, 2023 at 3:50 AM Koji Sekiguchi wrote: > Hi Karl, everyone! > > I'm writing to the moderator

Re: Is Manifold capable of handling these kind of files

2022-12-23 Thread Karl Wright
The internals of ManifoldCF will handle this fine if you are sure to set the encoding of your database to be UTF-8. However, I don't know about the JCIFS library, and whether there might be a restriction on characters in that code base. I think you'd have to just try it and see, frankly. Karl

Re: Frequent error while window shares job

2022-08-22 Thread Karl Wright
You will need to contact the current maintainers of the Jcifs library to get answers to these questions. Karl On Mon, Aug 22, 2022 at 3:27 AM ritika jain wrote: > Hi All, > > I have a Windows shared job to crawl files from samba server, it's a huge > job to crawl documents in millions(about

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-14 Thread Karl Wright
elete, considering of course > > the things that need to be deleted with them (BD, etc). Do you think > this is possible? > > We think that not only us but the community would be benefited from this > kind of functionality. > > > > Ricardo. > > > > On Mon

Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-13 Thread Karl Wright
Because ManifoldCF is not just a crawler, but a synchonizer, a job represents and includes a list of documents that have been indexed. Deleting the job requires deleting the documents that have been indexed also. It's part of the basic model. So if you tear down your target output instance and

Re: Job Service Interruption- and stops

2022-04-29 Thread Karl Wright
" repeated service interruption" means that it happens again and again. For this particular document, the problem is that the error we are seeing is: "The process cannot access the file because it is being used by another process." ManifoldCF assumes that if it retries enough it should be able

Re: Log4j Update Doubt

2022-03-15 Thread Karl Wright
We cannot do back patches of older versions of ManifoldCF. There is a new release which shipped in January that addresses log4j issues. I suggest updating to that. Karl On Tue, Mar 15, 2022 at 8:59 AM ritika jain wrote: > Hi, > > How manifoldcf uses log4j files in bin

Re: Manifoldcf freezes and sit idle

2022-01-31 Thread Karl Wright
As I've mentioned before, the best way to diagnose problems like this is to get a thread dump of the agents process. There are many potential reasons it could occur, ranging from stuck locks to resource starvation. What locking model are you using? Karl On Mon, Jan 31, 2022 at 6:02 AM ritika

Re: Log4j dependency

2021-12-14 Thread Karl Wright
ManifoldCF framework and connectors use log4j 2.x to dump information to the ManifoldCF log file. Please read the following page: https://logging.apache.org/log4j/2.x/security.html Specifically, this part: 'Descripton: Apache Log4j2 <=2.14.1 JNDI features used in configuration, log messages,

Re: Manifoldcf background process

2021-11-18 Thread Karl Wright
The degree of parallelism can be controlled in two ways. The first way is to set the number of worker threads to something reasonable. Usually, this is no more than about 2x the number of processors you have. The second way is to control the number of connections in your jcifs connector to keep

Re: Manifold Job process isssue

2021-11-15 Thread Karl Wright
T,status=0,flags=0x0000,mid=4,wordCount=0,byteCount=72 > at > jcifs.util.transport.Transport.waitForResponses(Transport.java:365) > at jcifs.util.transport.Transport.sendrecv(Transport.java:232) > at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl

Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
One hour is quite a lot and will wreak havoc on the document queue. Karl On Tue, Nov 9, 2021 at 7:08 AM ritika jain wrote: > I have checked, there is only one hour time difference between docker > container and docker host > > On Tue, Nov 9, 2021 at 4:41 PM Karl Wright wrote:

Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
If your docker image's clock is out of sync badly with the real world, then System.currentTimeMillis() may give bogus values, and ManifoldCF uses that to manage throttling etc. I don't know if that is the correct explanation but it's the only thing I can think of. Karl On Tue, Nov 9, 2021 at

Re: Duplicate key error

2021-10-27 Thread Karl Wright
the same problem. If the problem IS repeatable, we will of course look deeper into what is going on. Karl On Wed, Oct 27, 2021 at 9:52 AM Karl Wright wrote: > Is it repeatable? My guess is it is not repeatable. > Karl > > On Wed, Oct 27, 2021 at 4:43 AM ritika jain > wrote: &

Re: Duplicate key error

2021-10-27 Thread Karl Wright
Is it repeatable? My guess is it is not repeatable. Karl On Wed, Oct 27, 2021 at 4:43 AM ritika jain wrote: > So , it can be left as it is.. ? because it is preventing job to complete > and its stopping. > > On Tue, Oct 26, 2021 at 8:40 PM Karl Wright wrote: > >> That's

Re:

2021-10-26 Thread Karl Wright
That's a database bug. All of our underlying databases have some bugs of this kind. Karl On Tue, Oct 26, 2021 at 9:17 AM ritika jain wrote: > Hi All, > > While using Manifoldcf 2.14 with Web connector and ES connector. After a > certain time of continuing the job (jobs ingest some documents

Re: Windows Shares job-Limit on defining no of paths

2021-10-25 Thread Karl Wright
The only limit is that the more you add, the slower it gets. Karl On Mon, Oct 25, 2021 at 6:06 AM ritika jain wrote: > Hi , > Is there any limit on the number of paths we can define in job using > Repository as Window Shares and ES as Output > > Thanks >

Re: Null Pointer Exception

2021-10-25 Thread Karl Wright
The API should really catch this situation. Basically, you are calling a function that requires an input but you are not providing one. In that case the API sets the input to "null", and the detailed operation is called. The detailed operation is not expecting a null input. This is API piece

Re: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Karl Wright
Hi, You say this is a "Tika error". Is this Tika as a stand-alone service? I do not recognize any ManifoldCF classes whatsoever in this thread dump. If this is Tika, I suggest contacting the Tika team. Karl On Thu, Sep 30, 2021 at 3:02 AM Bisonti Mario wrote: > Additional info. > > > > I

Re: Tika Parser Issue

2021-09-07 Thread Karl Wright
This is something you should contact the Tika project about. Karl On Tue, Sep 7, 2021 at 8:46 AM ritika jain wrote: > Hi All, > > I am using tika-core 1.21 and tika-parsers 1.21 jar files as tika > dependencies in Manifoldcf 2.14 version. > Getting some issues while parsing *PDF *files. Some

Re: Query:JCIFS connector

2021-08-23 Thread Karl Wright
I have a work day today, with limited time. The UI is what it is; it does not have capabilities beyond what is stated in the UI and in the manual. It's meant to allow construction of paths piece by piece, not by full subdirectory at a time. You can obviously use the API if you want to construct

Re: Job Deletion query

2021-08-12 Thread Karl Wright
Yes, when you delete a job, the indexed documents associated with that job are removed from the index. ManifoldCF is a synchronizer, not a crawler, so when you remove the synchronization job then if it didn't delete the indexed documents they would be left dangling. Karl On Thu, Aug 12, 2021

Re: Window shares dynamic Job issue

2021-08-11 Thread Karl Wright
t;_attribute_indexable":"yes","_attribute_filespec":"\/*.docb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dot","_value_&q

Re: Window shares dynamic Job issue

2021-08-10 Thread Karl Wright
I am sorry, but I'm having trouble understanding how exactly you are configuring the JCIFS connector in these two cases.Can you view the job in each case and provide cut-and-paste of the view? Karl On Tue, Aug 10, 2021 at 9:09 AM ritika jain wrote: > Hi All, > > I am using Window shares

Re: Is the Web connector supporting zipped sitemap.xml.gz referenced by robots.txt?

2021-07-07 Thread Karl Wright
+; > return new > > UrlsetContextClass(theStream,namespace,localName,qName,atts,documentURI,handler); >} > > So, my question is: is there another way to handle sitemaps inside the > Web Crawler? > > Cheers Sebastian > > > > > > Am 07

Re: Is the Web connector supporting zipped sitemap.xml.gz referenced by robots.txt?

2021-07-07 Thread Karl Wright
The robots parsing does not recognize the "sitemaps" line, which was likely not in the spec for robots when this connector was written. Karl On Wed, Jul 7, 2021 at 3:31 AM h0444xk8 wrote: > Hi, > > I have a general question. Is the Web connector supporting sitemap files > referenced by the

Re: Manifoldcf Redirection process

2021-05-28 Thread Karl Wright
302 does get recognized as a redirection, yes On Fri, May 28, 2021 at 5:07 AM ritika jain wrote: > Is the process the same when fetch/process status code returned is 302 ? When running a job with web crawler and ES output connector >>> > can anybody have a clue about this >

Re: Manifoldcf Redirection process

2021-05-19 Thread Karl Wright
ManifoldCF reads all the URLs on its queue. If it's a 301, it detects this and pushes the new URL onto the document queue. When it gets to the new URL, it processes it like any other. Karl On Wed, May 19, 2021 at 8:32 AM ritika jain wrote: > Hi > > I want to understand the process of "How

Re: Interrupted while acquiring credits

2021-05-14 Thread Karl Wright
e job or crashing manifold > > On Fri, May 14, 2021 at 1:34 PM Karl Wright wrote: > >> ' >> >> *JCIFS: Possibly transient exception detected on attempt 1 while getting >> share security'Yes, it is going to retry.* >> >> *Karl* >> >>

Re: Interrupted while acquiring credits

2021-05-14 Thread Karl Wright
' *JCIFS: Possibly transient exception detected on attempt 1 while getting share security'Yes, it is going to retry.* *Karl* On Fri, May 14, 2021 at 1:45 AM ritika jain wrote: > Hi, > I am using Windows shares connector in manifoldcf 2.14 and ElasticSearch > connector as Output connector and

Re: Notification connector error

2021-05-11 Thread Karl Wright
This used to work fine, but I suspect that when SSH was declared unsafe, it was disabled, and now only TLS will work. Karl On Tue, May 11, 2021 at 12:13 PM wrote: > Hello, > > > > I am trying to use an email notification connector but without success. > When the connector tries to send an

Re: General questions

2021-04-12 Thread Karl Wright
Hi, There was a book written but never published on ManifoldCF and how to write connectors. It's meant to be extended in that way. The PDFs for the book are available for free online, and they are linked through the manifoldcf web site. Karl On Mon, Apr 12, 2021 at 8:49 AM koch wrote: > Hi

Re: Manifoldcf Deletion Process

2021-03-30 Thread Karl Wright
Hi Ritika, There is no deletion process. Deletion takes place when a job is run in a mode where deletion is possible (there are some where it is not). The way it takes place depends on the kind of repository connector (what model it declares itself to use). For the most common kinds of

Re: Another Elasticsearch patch to allow the long URI

2021-03-20 Thread Karl Wright
to be sure that the connector works with most versions of ElasticSearch? Please help clarify so that I can finish this off. The changes are committed to trunk; I would be very appreciative if Shirai Takashi/ 白井隆 reviewed them there.Thanks! Karl On Sat, Mar 20, 2021 at 4:32 AM Karl Wright wrot

Re: Another Elasticsearch patch to allow the long URI

2021-03-20 Thread Karl Wright
. There are more changes in these patches than just the ID length issue. I am working to add this functionality as well but without anything I would consider to be unneeded. Karl On Fri, Mar 19, 2021 at 3:48 AM Karl Wright wrote: > Thanks for the information. I'll see what I can do. >

Re: Another Elasticsearch patch to allow the long URI

2021-03-19 Thread Karl Wright
Thanks for the information. I'll see what I can do. Karl On Thu, Mar 18, 2021 at 7:23 PM Shirai Takashi/ 白井隆 wrote: > Hi, Karl. > > Karl Wright wrote: > >Hi - I'm still waiting for this patch to be attached to a ticket. That is > >the only way I believe we're allowed

Re: Another Elasticsearch patch to allow the long URI

2021-03-18 Thread Karl Wright
Hi - I'm still waiting for this patch to be attached to a ticket. That is the only way I believe we're allowed to accept it legally. Karl On Thu, Mar 4, 2021 at 7:16 PM Shirai Takashi/ 白井隆 wrote: > Hi, Karl. > > Karl Wrightさんは書きました: > >I agree it is unlikely that the JDK wi

Re: Another Elasticsearch patch to allow the long URI

2021-03-04 Thread Karl Wright
I agree it is unlikely that the JDK will lose support for SHA-1 because it is used commonly, as is MD5. So please feel free to use it. Karl On Wed, Mar 3, 2021 at 7:54 PM Shirai Takashi/ 白井隆 wrote: > Hi, Horn. > > Jörn Franke wrote: > >Makes sense > > I don't think that it's easy. > > > >>>

Re: Another Elasticsearch patch to allow the long URI

2021-03-02 Thread Karl Wright
Hi - this is very helpful. I would like you to officially create a ticket in Jira: https://issues.apache.org/jira , project "CONNECTORS", and attach these patches. Backwards compatibility means that we very likely have to use the hash approach, and not use the decoding approach. Thanks, Karl

Re: Multiprocess file installation of manifold

2021-02-17 Thread Karl Wright
File synchronization is still supported but is deprecated. We recommend zookeeper synchronization unless you have a very good reason not to. Karl On Wed, Feb 17, 2021 at 12:26 PM Ananth Peddinti wrote: > Hello Team , > > > I would like to know if someone has already done multi-process model

Re: Job Content Length issue

2021-02-17 Thread Karl Wright
ue, Feb 16, 2021 at 7:29 PM Karl Wright wrote: > >> Hi, do you mean content limiter length of 100? >> >> I assume you are using the internal Tika transformer? Are you combining >> this with a Solr output connection that is not using the extract handler? >> >

Re: Job Content Length issue

2021-02-16 Thread Karl Wright
Hi, do you mean content limiter length of 100? I assume you are using the internal Tika transformer? Are you combining this with a Solr output connection that is not using the extract handler? By "manifold crashes" I assume you actually mean it runs out of memory. The "long running query"

Re: content length tab

2021-02-15 Thread Karl Wright
This parameter is in bytes. Karl On Mon, Feb 15, 2021 at 9:03 AM ritika jain wrote: > Hi Users, > > Can anybody tell me if this can be filled as bytes or kilobytes here. > > The "Content Length tab looks like this: > > > [image: Windows Share Job, Content Length tab] > > Values are to be

Re: Job status stuck in terminating

2021-01-07 Thread Karl Wright
Hi, Usually the reason a job doesn't complete is because a document is retrying indefinitely. You can see what's going on by looking at the Simple History job report, or, if you prefer, tailing the manifoldcf log. Other times a job won't complete because somebody shut down the agents process.

Re: Indexation Not OK

2021-01-01 Thread Karl Wright
gt; > > > > -- > > Michael Cizmar > > > > *From:* ritika jain > *Sent:* Thursday, December 31, 2020 7:33 AM > *To:* user@manifoldcf.apache.org > *Subject:* Re: Indexation Not OK > > > > Elastic search output connector with some custom changes for s

Re: Indexation Not OK

2020-12-31 Thread Karl Wright
in both the logs and in the simple history. Can you provide any error messages from the log that seem to be coming from the output connection? Thanks, Karl On Thu, Dec 31, 2020 at 8:30 AM Karl Wright wrote: > Hi, > Can you let us know what you are using for the output connector? &g

Re: Indexation Not OK

2020-12-31 Thread Karl Wright
Hi, Can you let us know what you are using for the output connector? Thanks, Karl On Thu, Dec 31, 2020 at 8:24 AM ritika jain wrote: > Hi, > > I am using Manifoldcf 2.14 and JCIFS connector, to ingest some billions of > records into elastic search > I am facing an issue in which when Job is

Re: Password admin UI

2020-12-17 Thread Karl Wright
that I used UTF-8 encoding to provide the > password to the obfuscate method and that testing the deobfuscate method > provides the right password with UTF-8 chars > > > > Julien > > > > *De :* Karl Wright > *Envoyé :* mercredi 16 décembre 2020 19:40 > *À

Re: Password admin UI

2020-12-16 Thread Karl Wright
Hi Julien, The properties file is read as utf-8, so as long as you make sure that the encoding in your editor is utf-8, it should work. Many editors default to the Microsoft code page so use something like scite or emacs. Karl On Wed, Dec 16, 2020 at 12:31 PM wrote: > Hi, > > > > I tried

Re: Memory problem on Agent ?

2020-10-02 Thread Karl Wright
. A rule of thumb is to leave 10gb free for system usage and divide the remainder among your Java processes. Thanks, Karl On Fri, Oct 2, 2020 at 11:21 AM Bisonti Mario wrote: > Yes, buti t seems that, when the indexing finished, the memory is not > released > > > > >

Re: Memory problem on Agent ?

2020-10-02 Thread Karl Wright
Hi Mario, Java processes only use the memory you hand them. It looks like you are handing Java more memory than your machine has. This will not work. Karl On Fri, Oct 2, 2020 at 10:45 AM Bisonti Mario wrote: > > > Hallo. > > > > When I scan the content of Repository , I note that memory

I updated the site with the new release yesterday, but hasn't gone live

2020-09-18 Thread Karl Wright
Svnpubsub seems to be broken. I sent email to infrastruct...@apache.org but apparently nobody reads that anymore. Stay tuned. Karl

Re: Job interrupted

2020-08-24 Thread Karl Wright
rentTime + 30L, -currentTime + 3 * 60 * 6L,-1,true); +currentTime + 3 * 60 * 6L,-1,false); } else if (se.getMessage().indexOf("cannot find") != -1 || se.getMessage().indexOf("cannot be found") != -1) { I'll commit to trunk as well. Kar

Re: Job interrupted

2020-08-24 Thread Karl Wright
lure processing document: The > process cannot access the file because it is being used by another process. > > > > > > *Da:* Karl Wright > *Inviato:* lunedì 24 agosto 2020 12:27 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Job interrupted > > > > Well, we

Re: Job interrupted

2020-08-24 Thread Karl Wright
:55 AM Bisonti Mario wrote: > Yes, but after I obtain: > > > > Error: Repeated service interruptions - failure processing document: The > process cannot access the file because it is being used by another process. > > > > And the job stops > > > > >

Re: Job interrupted

2020-08-24 Thread Karl Wright
Hi, That's a warning. The job will keep running and the document will be retried later. Karl On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario wrote: > Hallo. > > I have some problems about job interrupted. > > The job execute a windows share scan > > > > After many errors, sometimes it stops >

Re: How to reset job status

2020-08-19 Thread Karl Wright
it USING THE SCRIPTS PROVIDED. Your problems should resolve. If not, you should have logging in manifoldcf.log telling you what is going wrong. Karl On Wed, Aug 19, 2020 at 6:40 AM Karl Wright wrote: > You do not see log output. Therefore I need to ask you some questions. >

Re: How to reset job status

2020-08-19 Thread Karl Wright
You do not see log output. Therefore I need to ask you some questions. What deployment model are you using? single process or multi-process? what is the synchronization method? On Wed, Aug 19, 2020 at 6:38 AM Karl Wright wrote: > Usually when you shut down the agents process (or the wh

Re: How to reset job status

2020-08-19 Thread Karl Wright
Mario wrote: > No, I haven’t a notification connector, buti it isn’t the problem. > > Manifoldcf.log is empty > > > > The problemi s that job is on hanging state and I would like to reset its > state > > > > > > > > *Da:* Karl Wright > *I

Re: How to reset job status

2020-08-19 Thread Karl Wright
There should be output in your manifoldcf.log file, no? This may be the result of you not having a notification connector's code actually registered so you get no class found errors. The only solution is to put the missing jar in place and restart your agents process. Have a look at the log to

Re: WebCrawler Connector code

2020-07-07 Thread Karl Wright
defined in its interface class IJobManager, but not defined in its > implementation class JobManager, so it is always returning an empty value. > > Thanks > > > On Mon, Jul 6, 2020 at 6:44 PM Karl Wright wrote: > >> Hi Ritika, >> >> ' My requirement is to abort a job w

Re: WebCrawler Connector code

2020-07-06 Thread Karl Wright
Hi Ritika, ' My requirement is to abort a job whenever a seed-corresponding site is down or returning some 5xx response codes. ' (1) Connector methods, like addSeedDocuments(), are called by the framework. You do not call them yourself when you write a connector. So you are looking in the

Re: Sharepoint 2019

2020-06-10 Thread Karl Wright
> > If it would help, I could share the Sharepoint.dll from 2019 sharepoint. > > Thanks! > Shelly > > > > On 2020/06/10 05:48:01, Karl Wright wrote: > > Hi, > > One is not available yet. In order to build one I need a copy of the > > Share

Re: Sharepoint 2019

2020-06-10 Thread Karl Wright
Forgot the svn path: https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2019/trunk Karl On Wed, Jun 10, 2020 at 2:07 AM Karl Wright wrote: > I've set up an svn path for this plugin. If you can "svn co" this path it > should give you the plugin source (no dif

Re: Sharepoint 2019

2020-06-10 Thread Karl Wright
d, chances are good it will just work. If you would like me to build a distribution release, I will need the DLL in order to be able to do that. Karl On Wed, Jun 10, 2020 at 1:48 AM Karl Wright wrote: > Hi, > One is not available yet. In order to build one I need a copy of the >

Re: Sharepoint 2019

2020-06-09 Thread Karl Wright
Hi, One is not available yet. In order to build one I need a copy of the Sharepoint.dll from a Sharepoint 2019 instance and some time. Karl On Wed, Jun 10, 2020 at 1:30 AM Shelly Singh wrote: > I am looking for Sharepoint 2019 plugin. Is one available? >

Re: Window shares job-Error ERROR: invalid byte sequence for encoding "UTF8": 0x00

2020-06-03 Thread Karl Wright
This is a Postgresql problem of some kind. It could be the network connection between your ManifoldCF process(es) and the Postgresql server. If it's repeating I'd worry about it, otherwise it will recover. Karl On Wed, Jun 3, 2020 at 3:58 AM ritika jain wrote: > Hi All, > > I am using

Re: Web connector login sequence

2020-06-02 Thread Karl Wright
iate the regex > rules by removing a letter in one of them: > “other-site\/cas\/logi = redirect” vs “other-site\/cas\/login = form”. But > this does not feel like a “clean” solution > > > > Regards, > Julien > > > > > > > > *De :* Karl Wright >

Re: Crawling / Indexation Query

2020-05-30 Thread Karl Wright
k > > On Thu, May 7, 2020 at 4:11 PM Karl Wright wrote: > >> Hi, >> >> ManifoldCF is not a crawler, it's a synchronizer. If robots says not to >> crawl something, then it will not be indexed. If robots is changed to >> prohibit crawling of certain documents

Re: Web connector login sequence

2020-05-29 Thread Karl Wright
Hi Julien, The login sequence must include all parts of the login sequence, from initiation (the first 302 that you get when you load /site) all the way through to the last action that sets the cookie. After the login sequence is completed, the /site URL will be fetched again. If you need more

Re: URL Mapping

2020-05-28 Thread Karl Wright
or the person or entity to > whom it is directed. Unauthorized use, disclosure, distribution or copying > is strictly prohibited and may be unlawful. If you are not the intended > recipient, please notify us immediately and permanently delete this e-mail > and any attachments. > &g

Re: Error: Repeated service interruptions - failure processing document: Failed to acquire credits in time

2020-05-21 Thread Karl Wright
nnector faces some > problem while connecting? (a network issue) > > Thanks > Ritika > > On Tue, May 19, 2020 at 2:39 PM Karl Wright wrote: > >> I commented in the ticket you created. >> Thanks, >> Karl >> >> On Tue, May 19, 2020 at 3:07 AM ritik

Re: Error: Repeated service interruptions - failure processing document: Failed to acquire credits in time

2020-05-19 Thread Karl Wright
I commented in the ticket you created. Thanks, Karl On Tue, May 19, 2020 at 3:07 AM ritika jain wrote: > Hi All, > > I am configured Units job (Manifoldcf 2.14 and ES 7.6.2 and postgres > 9.6.10) on server to access files from samba SMBv3 server and used > jcifs-ng-2.1.2.jar to be loaded in lib

Re: Crawling / Indexation Query

2020-05-07 Thread Karl Wright
Hi, ManifoldCF is not a crawler, it's a synchronizer. If robots says not to crawl something, then it will not be indexed. If robots is changed to prohibit crawling of certain documents, then yes, those documents will be removed from the index. But you can override the robots behavior in the

Re: ES 7.6.2

2020-05-07 Thread Karl Wright
Hi Ritika, ManifoldCF's ElasticSearch connector does not include any code that requires Java 11, so you are all set. Because JDK 11 removes many packages, however, you should expect to run ManifoldCF 2.14 with Java 8. ManifoldCF 2.16, just released, supports Java 11. Karl On Thu, May 7, 2020

Re: Extraction of related links

2020-02-12 Thread Karl Wright
This is not functionality that ManifoldCF supports out of the box. The extracted links are used for crawling, not as metadata. I don't see a general use-case for this either, so I think you're on your own modifying the web connector code to do what you want. The RepositoryDocument structure has

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-30 Thread Karl Wright
here is some patch to the code that did not make it > into the git. > I can tell that I can test with Content Server 10 as well as 16. > SOAP UI has no problem and in the end it does exactly the same (starting > from the https WSDL etc.) > > Am 22.01.2020 um 13:17 schrieb Karl Wright

Re: sharepoint crawler documents limit

2020-01-27 Thread Karl Wright
I believe we had increased the parameter indicated >> in the link below >> >> >> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website >> >> >> >> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright wrote: >> >>&g

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-22 Thread Karl Wright
y configuration that tries to force file based access > of those? In the Code i did not find anything suspicious. > > Am 22.01.2020 um 10:28 schrieb Karl Wright : > >  > Has there been any news? > I'd love to get this tied up so that you're able to proceed. > Karl > >

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-22 Thread Karl Wright
now > is that the xsd to which the WSDL is referring are not fetched. The bizarre > thing is that the https url that it mention for the xsd is absolutely > correct. So I assume it does not understand an http url, maybe that is > related to configuration. > > Am 16.01.2020 um 14:53

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
age it tries to access the > xsd through a https url (which is perfectly accessible for the server). > Could it be that the connector restrict itself somehow to local file > system only or similar? > Have you faced this issue before? > > > > Am 16.01.2020 um 12:56 schrieb Karl

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
I should say that we have (AFAICT) at least two independent installations of the csws connector working in the field, at least one of them using secure connections. Karl On Thu, Jan 16, 2020 at 6:54 AM Karl Wright wrote: > We solved the WSDL fetching through HTTPS, or thought we

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
responding issue. Apparently, the fetching of > the WSDL itself through https was not possible. Do you remember still some > insights beyond what is written in the issue ? > > Am 16.01.2020 um 00:37 schrieb Karl Wright : > >  > Let me think about that option. > > Karl > &g

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
ols for security reasons. Of course, the choice is then > with the people using the software. > Could that be something sensible from your point of view? > > On Wed, Jan 15, 2020 at 11:14 PM Karl Wright wrote: > >> It's rather immaterial what browsers do here. What's important

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
if you have an idea how this should be made configurable then I > can look into this. > > Best regards > > Am 15.01.2020 um 10:52 schrieb Karl Wright : > >  > Hi, > > Mcf currently requires jdk8. Jdk11 is non trivial to support because of > the removal of many jdk

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
vestigate what that implies. > Does ManifoldCf supports JDK11? > > Am 15.01.2020 um 00:08 schrieb Karl Wright : > >  > I think you can just change the code to read as follows when it creates > the SSLContext: > > SSLContext ctx = SSLContext.getInstance("TLSv1"); >

  1   2   3   4   5   6   7   8   9   10   >