Re: Scheduling Problem and the IBM Domino Connector

2018-07-30 Thread Karl Wright
I am not aware of any existing Domino connector. Karl On Mon, Jul 30, 2018 at 12:19 PM Cheng Zeng wrote: > Thank you very much for your reply. Your advice is very helpful. > > I am wondering if the MCF supports IBM Domino? > > Does anyone know if there are available libraries or API resource

Re: Scheduling Problem and the IBM Domino Connector

2018-07-30 Thread Cheng Zeng
Thank you very much for your reply. Your advice is very helpful. I am wondering if the MCF supports IBM Domino? Does anyone know if there are available libraries or API resource to extract documents from Domino server? Best wishes, Cheng On 30 Jul 2018, at 17:48, Karl Wright

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
Well, I have absolutely no idea what is wrong and I've never seen anything like that before. But postgres is complaining because the communication with the JDBC client is being interrupted by something. Karl On Mon, Jul 30, 2018 at 10:39 AM Mike Hugo wrote: > No, and manifold and postgres

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Mike Hugo
No, and manifold and postgres run on the same host. On Mon, Jul 30, 2018 at 9:35 AM, Karl Wright wrote: > ' LOG: incomplete message from client' > > This shows a network issue. Did your network configuration change > recently? > > Karl > > > On Mon, Jul 30, 2018 at 9:59 AM Mike Hugo wrote: >

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
' LOG: incomplete message from client' This shows a network issue. Did your network configuration change recently? Karl On Mon, Jul 30, 2018 at 9:59 AM Mike Hugo wrote: > Tried a postgres vacuum and also a restart, but the problem persists. > Here's the log again with some additional

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Mike Hugo
Tried a postgres vacuum and also a restart, but the problem persists. Here's the log again with some additional logging details added (below) I tried running the last query from the logs against the database and it works fine - I modified it to return a count and that also works. SELECT count(*)

Re: Scheduling Problem

2018-07-30 Thread Karl Wright
Hi Cheng, Dynamic recrawl revisits documents based on the frequency that they changed in the past. It is therefore hard to make any prediction about whether a document will be recrawled in a given time interval. You need recrawls of existing directories in order to discover new documents in

Scheduling Problem

2018-07-30 Thread Cheng Zeng
Hi Karl, I have a question about the schedule-related configuration in the job. I have a continuously running job which crawls the documents in Sharepoint 2013 and the job is supposed to re-crawl about 26,000 docs every 24 hours as configured, however, it seems that there are something wrong

Re: PSQLException: This connection has been closed.

2018-07-29 Thread Karl Wright
It looks to me like your database server is not happy. Maybe it's out of resources? Not sure but a restart may be in order. Karl On Sun, Jul 29, 2018 at 9:06 AM Mike Hugo wrote: > Recently we started seeing this error when Manifold CF starts up. We had > been running Manifold CF with many

PSQLException: This connection has been closed.

2018-07-29 Thread Mike Hugo
Recently we started seeing this error when Manifold CF starts up. We had been running Manifold CF with many web connectors and a few RSS feeds for a while and it had been working fine. The server got rebooted and since then we started seeing this error. I'm not sure exactly what changed. Any

Re: Exclude files ~$*

2018-07-27 Thread Karl Wright
Can you view the job and include a screen shot of where this is displayed? Thanks. The exclusions are not regexps -- they are file specs. The file specs have special meanings for "*" (matches everything) and "?" (matches one character). You do not need to URL encode them. If you enable

Re: Tika/POI bugs

2018-07-27 Thread Karl Wright
To solve your production problem I highly recommend limiting the size of the docs fed to Tika, for a start. But that is no guarantee, I understand. Out of memory problems are very hard to get good forensics for because they cause major disruptions to the running server. You could turn on a

Exclude files ~$*

2018-07-27 Thread msaunier
Hi Karl, In my JCIFS connector, I want to configure an exclude condition if files name start with ~$* I have add the condition, but it does not working. I need to add: %7E%24* or a regex? Thanks, Maxence,

RE: Tika/POI bugs

2018-07-27 Thread msaunier
Hi Karl, Okay. For the Out of Memory: This is the last day that I can go on to find out where the error comes from. After that, I should go into production to meet my deadlines. I hope to find time in the future to be able to fix this problem on this server, otherwise I could not index

Tika/POI bugs

2018-07-27 Thread Karl Wright
Hi all, I've easily spent 40 hours over the last two weeks chasing down bugs in Apache Tika and POI. The two kinds I see are "ClassNotFound" (due to usage of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due to yet). I don't have enough time to create tickets directly in

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
I am afraid you will need to open a Tika ticket, and be prepared to attach your file to it. Thanks, Karl On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario wrote: > It isn’t a memory problem because xls file bigger (30MB) have been > processed. > > > > This file xlsm with many colors etc hang > >

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
It isn’t a memory problem because xls file bigger (30MB) have been processed. This file xlsm with many colors etc hang I could suppose that it is a tika/solr erro but I don’t know how to solve it ☹ Oggetto: R: Job stuck internal http error 500 Yes, I am using:

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Yes, I am using: /opt/manifoldcf/multiprocess-file-example-proprietary I set: sudo nano options.env.unix -Xms2048m -Xmx2048m But I obtain the same error. My doubt is that it could be a solr/tika problem. What could I do? I restrict the scan to a single file and I obtain the same error Da: Karl

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
Although it is not clear what process you are talking about. If solr ask them. Karl On Fri, Jul 27, 2018, 5:36 AM Karl Wright wrote: > I am presuming you are using the examples. If so, edit the options file > to grant more memory to you agents process by increasing the Xmx value. > > Karl >

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario wrote: > Hallo. > > My job is stucking indexing an xlsx file of 38MB > > > > What could I do to

Re: Solr connection, max connections and CPU

2018-07-27 Thread Bisonti Mario
Thanks a lot Karl!!! On 2018/07/26 13:28:47, Karl Wright wrote: > Hi Mario,> > > There is no connection between the number of CPUs and the number output> > connections. You pick the maximum number of output connections based on> > the number of listening threads that you can use at the same

Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Hallo. My job is stucking indexing an xlsx file of 38MB What could I do to solve my problem? In the following there is the error: 2018-07-27 08:55:15.562 WARN (qtp1521083627-52) [ x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract java.lang.OutOfMemoryError at

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
The way it works in the JCIFS connector is that files that aren't within the specification are removed from the list of files being processed. If a file is already being processed, however, it is just retried. So changing this property to make an out-of-memory condition go away is not going to

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
How are you limiting content size? Is this in the repository connection, or in an Allowed Documents transformation connection? Karl On Thu, Jul 26, 2018 at 10:58 AM msaunier wrote: > I have limit to 20Mb / document and I have again an out of memory java. > > > > > > > > *De :* Karl Wright

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
I believe there's also a content length tab in the Windows Share connector, if you're using that. Karl On Thu, Jul 26, 2018 at 10:19 AM Karl Wright wrote: > The ContentLimiter truncates documents. That's not what you want. > > Use the Allowed Documents transformer. > > Karl > > > On Thu, Jul

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
The ContentLimiter truncates documents. That's not what you want. Use the Allowed Documents transformer. Karl On Thu, Jul 26, 2018 at 10:06 AM msaunier wrote: > I have add a Content limiter transformation before Tika extractor. It’s > very very slow now. It’s normal? > > > > Maxence, > > >

Re: Solr connection, max connections and CPU

2018-07-26 Thread Karl Wright
Hi Mario, There is no connection between the number of CPUs and the number output connections. You pick the maximum number of output connections based on the number of listening threads that you can use at the same time in Solr. Karl On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario wrote: >

Solr connection, max connections and CPU

2018-07-26 Thread Bisonti Mario
Hallo, I setup solr connection in the "Output connections" of Manifold I don't understand if there is a relation between "Max Connections" and the number of CPUs in the host. Could you help me ti understand it? Thanks a lot Mario

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
Hi Maxence, I am wondering whether you moved any jars from dist/connector-common-lib to dist/lib? If you did this, you will mess up the ability of any of the Tika jars to find their dependencies. This also explains why commons-compress cannot be found; it's in connector-common-lib. It sounds

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
Hi Maxence, The following error: >> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor at

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright
Here's the documentation from HttpClient on the various cookie policies. You're probably going to need to read some of the RFCs to see which policy you want. I will wait for you to get back to me with a recommendation before taking any action in the MCF codebase. Thanks!

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright
Ok, so the database for your site crawl contains both z.com and x.y.z.com cookies? And your site pages from domain a.y.z.com receive no cookies at all when fetched? Is that a correct description of the situation? Please verify that the a.y.z.com pages are part of the protected part of your

Re: web crawler not sharing cookies

2018-07-26 Thread Gustavo Beneitez
Hi, database may contain Z.com and X.Y.Z.com if created automatically through a JSP, but not the intermediate one Y.Z.com. if the crawler decides to go to A.Y.Z.com and looking to database Z.com is present, it still doesn't work (it should since A.Y.Z is a sub-domain in Z). Only doing that

Re: web crawler not sharing cookies

2018-07-25 Thread Karl Wright
The web connector, though, does not filter any cookies. It takes them all -- whatever cookies HttpClient is storing at that point. So you should see all the cookies in the database table, regardless of their site affinity, unless HttpClient is refusing to accept a cookie for security reasons.

Re: web crawler not sharing cookies

2018-07-25 Thread Gustavo Beneitez
I agree, but the fact is that if my "login sequence" defines a login credential for domain "Z.com" and the crawler reaches "Y.Z.com" or " X.Y.Z.com", none of the sub-sites receives that cookie, I need to write same cookie for every sub-domain, that solves the situation (and thankfully is a

Re: Speed up cleaning up job

2018-07-25 Thread Karl Wright
The "cleaning up" phase deletes the documents in the target index (where your outputconnectors point). That takes more time. Karl On Wed, Jul 25, 2018 at 1:43 PM msaunier wrote: > If I delete a job on ManifoldCF, jobs pass in « Cleaning Up » status. > > > > « Processed » document are delete

RE: Speed up cleaning up job

2018-07-25 Thread msaunier
If I delete a job on ManifoldCF, jobs pass in « Cleaning Up » status. « Processed » document are delete very fast « Active » documents too. But « Documents » on the interface, it’s very slow to delete every lines. ManifoldCF delete Documents 100 by 100. Maxence, De : Karl

Re: Speed up cleaning up job

2018-07-25 Thread Karl Wright
I'm sorry, I don't understand your question? Karl On Wed, Jul 25, 2018 at 12:53 PM msaunier wrote: > Hi Karl, > > > > Can I configure ManifoldCF to cleaning up faster ? I think, ManifoldCF > Clean 100 by 100 by default. > > > > Maxence, > > >

Re: web crawler not sharing cookies

2018-07-25 Thread Karl Wright
You should not need to fill the database by hand. Your login sequence should include whatever redirection etc is used to set the cookies though. Karl On Wed, Jul 25, 2018 at 1:06 PM Gustavo Beneitez wrote: > Hi again, > > Thanks Karl, I was able of doing that after defining some "login >

***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
It looks like you are still running out of memory. I would love to know what document it was that doing that. I suspect it is very large already, and for some reason it cannot be streamed. Karl On Wed, Jul 25, 2018 at 1:13 PM Karl Wright wrote: > Hi Maxence, > > The second exception is

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
Hi Maxence, The second exception is occurring because processing is still occurring while the JVM is shutting down; it can be ignored. Karl On Wed, Jul 25, 2018 at 1:01 PM msaunier wrote: > Hi Karl, > > > > I have add the snapshot and I’m spam with this error : > > > > FATAL

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
That's what I was afraid of. The new poi jars have dependencies we haven't accounted for yet. Can you download apache-commons-compress jar (latest version should be OK) and also put that in connector-common-lib? Thanks!! Karl On Wed, Jul 25, 2018 at 1:01 PM msaunier wrote: > Hi Karl, > > >

Re: web crawler not sharing cookies

2018-07-25 Thread Gustavo Beneitez
Hi again, Thanks Karl, I was able of doing that after defining some "login sequence", but also after filling database (cookiedata table) with certain values due to "domain constrictions". Before every web call, I suspect Manifold only takes cookies from URL exact subdomain (i.e. x.y.z.com), so if

Speed up cleaning up job

2018-07-25 Thread msaunier
Hi Karl, Can I configure ManifoldCF to cleaning up faster ? I think, ManifoldCF Clean 100 by 100 by default. Maxence,

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
Out of memory errors are fatal, I'm afraid, because they corrupt not only the document in question but all others being processed at the same time. So those cannot be ignored. Tika should ignore documents that it cannot process, however, and that is a great enhancement request for them. Karl

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
Hi Maxence, Tomorrow (7/26) the POI project will be delivering a nightly build which should repair the Class Not Found exceptions. You will need to download it here: https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/ ... and replace all poi jars

Re: Out of memory, one file bug i think

2018-07-24 Thread Karl Wright
The problem isn't with images in general; it's with certain kinds of images. There are optional dependencies in Tika for some kinds of images that we cannot include in the MCF distribution because of licensing problems. I don't know which kinds these are but apparently you are trying to index

Re: Out of memory, one file bug i think

2018-07-24 Thread Karl Wright
" java.lang.NoSuchMethodException: org.openxmlformats.schemas. wordprocessingml.x2006.main.impl.CTPictureBaseImpl.( org.apache.xmlbeans.SchemaType, boolean)" This exception is occurring because you are trying to extract content from an image. In order for this to work you need a jar that isn't

Re: Out of memory, one file bug i think

2018-07-24 Thread Karl Wright
Hi Maxence, You would want to turn on connector debugging INSTEAD of the debugging you've turned on, which is very noisy and not helpful. In global properties: org.apache.manifoldcf.connectors value DEBUG Karl On Tue, Jul 24, 2018 at 9:12 AM msaunier wrote: > With debug: > > > >

RE: Out of memory, one file bug i think

2018-07-24 Thread msaunier
With debug: [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 28034ms for sessionid 0x10050ae0049 [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO

Re: Out of memory, one file bug i think

2018-07-24 Thread Karl Wright
I've opened CONNECTORS-1516 to track the Class Not Found issue, and also created an Apache POI bugzilla ticket, which is referenced. Karl On Tue, Jul 24, 2018 at 6:15 AM Karl Wright wrote: > The "class not found" error looks probably like a classloader issue with > Tika -- the class is

Re: Optimized memory used

2018-07-24 Thread Karl Wright
ManifoldCF's usage of memory is bounded per thread, but obviously scales with the number of worker threads you have. If you are using Tika, the amount of memory that may be used varies a lot, however, because Tika's streaming document memory behavior is quite variable, depending on the kind of

Re: Out of memory, one file bug i think

2018-07-24 Thread Karl Wright
The "class not found" error looks probably like a classloader issue with Tika -- the class is present in poi-ooxml-3.17.jar, although to be fair it might possibly be caused by an out-of-memory condition. You should be able to find the exception in the Simple History and figure out what document

Optimized memory used

2018-07-24 Thread msaunier
Hello Karl, With my ManifoldCF, I have 12 Go used sometimes. I would like to know if certain actions make it possible to optimize this use of memory. Commits more frequent for example? Thanks, Maxence,

Re: web crawler not sharing cookies

2018-07-20 Thread Gustavo Beneitez
Hi, thanks a lot, please let me check then the documentation for an example of that. Regards! El jue., 19 jul. 2018 a las 21:54, Karl Wright () escribió: > You are correct that cookies are not shared among threads. That is by > design. > > The only way to set cookies for the WebConnector is

Re: web crawler not sharing cookies

2018-07-19 Thread Karl Wright
You are correct that cookies are not shared among threads. That is by design. The only way to set cookies for the WebConnector is to have there be a "login sequence". The login sequence sets cookies that are then used by all subsequent fetches. Thanks, Karl On Thu, Jul 19, 2018 at 3:38 PM

web crawler not sharing cookies

2018-07-19 Thread Gustavo Beneitez
Hi everyone, I have tried to look for an answer before writing this email, no luck. Sorry for the inconvenience if it is already answered. I need to set a cookie at the begining of the web crawling. The cookie rules the language you get the content, and while there are several choices, if no

Solr basic authentication

2018-07-16 Thread Shashank Raj
Hi Karl, I have been trying to access Solr Cloud (with 3zk nodes) server with basic authentication enabled. I added username and password in the server tab of Solr output connection but when starting any job, I get an error saying "required authentication". Although, I can connect to this Solr

Register now for ApacheCon and save $250

2018-07-09 Thread Rich Bowen
Greetings, Apache software enthusiasts! (You’re getting this because you’re on one or more dev@ or users@ lists for some Apache Software Foundation project.) ApacheCon North America, in Montreal, is now just 80 days away, and early bird prices end in just two weeks - on July 21. Prices will

Re: Error while crawling Infopath Forms in Sharepoint 2013

2018-07-06 Thread Karl Wright
Hi Nikita, There are no "plugins" available for the SharePoint connector. It only crawls libraries and attachments. In theory more supported types can be added but only if the (deprecated) SharePoint aspx services allow access to them. Karl On Fri, Jul 6, 2018 at 10:06 AM Nikita Ahuja

Re: Error while crawling Infopath Forms in Sharepoint 2013

2018-07-06 Thread Nikita Ahuja
Hi Karl, Thanks for your response. The infopath forms stores the data and required information. And it shows XML files. Can it work by using any plugin ? On Fri 6 Jul, 2018, 6:38 PM Karl Wright, wrote: > Sharepoint has a number of data types that ManifoldCF doesn't know how to > crawl.

Re: Error while crawling Infopath Forms in Sharepoint 2013

2018-07-06 Thread Karl Wright
Sharepoint has a number of data types that ManifoldCF doesn't know how to crawl. Sounds like infopath forms are one such data type. It's not clear that crawling a form is a good idea in any case. What content do you expect this to yield? Karl On Fri, Jul 6, 2018 at 7:59 AM Nikita Ahuja

Error while crawling Infopath Forms in Sharepoint 2013

2018-07-06 Thread Nikita Ahuja
Hello, I am executing a job to crawl Sharepoint 2013 data using ManifoldCF . I am able to crawl the data from library and get ingested it in Elastic Search index. But when the case comes for a infopath form stored in Sharepoint 2013 library it generates following error: *Manifoldcf Error:

Re: ManifoldCF 2.10 & Sharepoint 2013 - Configuration assistance

2018-06-27 Thread Karl Wright
Hi Arjan, The ManifoldCF Sharepoint 2013 connector expects to be given either the root of the whole SharePoint site, or the root of a virtual site. The error message displayed shows authorization error not accessing the root but rather http://gocnavigator.com/projects/5277. If this is a virtual

Webdav Repository

2018-06-21 Thread Bisonti Mario
Hallo. Is it possible to scan a remote webdav repository? I don’t find any info about it Thanks a lot Mario

Re: List all jobs page not working

2018-06-21 Thread Karl Wright
Works fine here. Karl On Thu, Jun 21, 2018 at 10:25 AM VINAY Bengaluru wrote: > Hi Karl, > The /json/jobs API request is not returning any results. > Also the list all jobs page isn't displaying in the front end. All other > pages work fine. We don't see any errors in the logs

RE: Documents blocked sometimes without errors

2018-06-21 Thread msaunier
Hello Karl, Ok I build and test this version. Thanks Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : jeudi 21 juin 2018 02:43 À : user@manifoldcf.apache.org Objet : Re: Documents blocked sometimes without errors Patch attached, and fix committed to trunk. Karl

Re: Documents blocked sometimes without errors

2018-06-20 Thread Karl Wright
Patch attached, and fix committed to trunk. Karl On Wed, Jun 20, 2018 at 8:32 PM Karl Wright wrote: > I've had time to look at this further. I believe that under some > conditions, when errors occur during processing a document, it might be > possible to wind up in this state. I'm in the

Re: Documents blocked sometimes without errors

2018-06-20 Thread Karl Wright
I've had time to look at this further. I believe that under some conditions, when errors occur during processing a document, it might be possible to wind up in this state. I'm in the process of working out a solution now. Karl On Mon, Jun 18, 2018 at 8:44 AM msaunier wrote: > Okay. I test

Re: script to schedule MCF Jobs by crontab login unauthorized

2018-06-19 Thread Karl Wright
There have been no security changes for many releases. Karl On Tue, Jun 19, 2018 at 9:25 AM Bisonti Mario wrote: > Hallo, I used a script to start remotely a job from crontab on MCF 2.9.1 > and it worked > > The sam script, now, in MCF 2.10 not ork. > > > > Now, I tried this command: > > > >

script to schedule MCF Jobs by crontab login unauthorized

2018-06-19 Thread Bisonti Mario
Hallo, I used a script to start remotely a job from crontab on MCF 2.9.1 and it worked The sam script, now, in MCF 2.10 not ork. Now, I tried this command: curl -c "cookie" -XPOST 'http://localhost:8080/mcf-api-service/json/LOGIN' -d @/SCRIPTS/user.json wher user.json: { "user":"admin",

R: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-19 Thread Bisonti Mario
Hallo Karl! Now I found how to build (my first building …  ) and now I am using: /multiprocess-file-example-proprietary folder and I deployed into Tomcat . I recreated the configuration that I used on binary version and I created the same job It works !!! I see on manifold cf.lg the error:

Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-19 Thread Karl Wright
Hi Mario, You cannot patch the binary. You must build from source to apply the patch. The easiest way forward is to check out trunk directly (with svn) and build it. The trunk svn URL is https://svn.apache.org/repos/asf/manifoldcf/trunk . Karl On Tue, Jun 19, 2018 at 3:35 AM Bisonti Mario

R: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-19 Thread Bisonti Mario
Hallo. Note that I specified the mime types on my solr output connection Furthermore, I used the binary distribution, how cold I path it with tour fix? I read on my job, that stuck up on 3 docs with: WARN 2018-06-19T09:29:21,366 (Worker thread '14') - JCIFS: Possibly transient exception

Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-18 Thread Karl Wright
Created CONNECTORS-1510 and committed a fix. Karl On Mon, Jun 18, 2018 at 2:33 PM Karl Wright wrote: > It certainly is a particular file -- the mime type is null, and that's > causing this line to blow up: > > final String lowerMimeType = mimeType.toLowerCase(Locale.ROOT); > > > That code

Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-18 Thread Karl Wright
It certainly is a particular file -- the mime type is null, and that's causing this line to blow up: final String lowerMimeType = mimeType.toLowerCase(Locale.ROOT); That code was added a couple of revs back to address a different problem; it's a trivial fix: final String lowerMimeType

Re: FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-18 Thread Steph van Schalkwyk
Looks like a particular file may be causing this. Try to find the filanem it crashes on and copy that to asmall crawl directory. Repeat crawl. On Mon, Jun 18, 2018 at 11:34 AM, Bisonti Mario wrote: > Hallo > > > > I configured ManifoldCF 2.10 with Tomcat 9.0.8 and Postgres 9.3 > > > > I

FATAL 2018-06-18T18:29:23,676 (Worker thread '36') - Error tossed: null

2018-06-18 Thread Bisonti Mario
Hallo I configured ManifoldCF 2.10 with Tomcat 9.0.8 and Postgres 9.3 I configured multiprocess-file-example When I create a Job to scan a big Windows share (22000 docs word, pdf, etc,) manifoldcf crash with the message: at

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Okay. I test to reproduce the problem again and view if they are they sames documents or if I have a pattern or other similarities. Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 14:42 À : user@manifoldcf.apache.org Objet : Re: Documents

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
If you are certain these are new documents, then there is no need to repeat yourself. But we do need to get some idea what action yields documents in this state. As I said before, it did not look possible to get there through any mechanism I can find. But I won't be able to look in full depth

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
I changed about ten days ago and the jobs were running correctly. I could do 2 passages without problems since the introduction of the trunk version. I have a doubt that they are old documents. I restart indexing and if it happens again, I'll tell you. Maxence, De : Karl Wright

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
My concern is that you upgraded the code but DID NOT do the pause/resume after you did that. If that was was the sequence, you were left with old, un-updated records. On Mon, Jun 18, 2018 at 8:18 AM msaunier wrote: > Yes my solution is paused the job and resume it. > > > > With the trunk

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Yes my solution is paused the job and resume it. With the trunk version, I feel it's less common but the problem is still here. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 12:14 À : user@manifoldcf.apache.org Objet : Re: Documents blocked sometimes without

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
Just so it is clear, the fix only will address documents that are in the "ACTIVE" state. Documents that are already blocked will not be fixed. The way you fix the blocked documents is by pausing and resuming the job that the documents are part of -- and then, if you are running the patched

TR: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Forget that. My ln-s is good on this server. I confused the servers. So I have a similar problem with trunk. I continu the tests. De : msaunier [mailto:msaun...@citya.com] Envoyé : lundi 18 juin 2018 11:13 À : 'user@manifoldcf.apache.org' Objet : RE: Documents blocked sometimes

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Ok I have miss my ln –s so my link go to 2.9.1. Sorry for this error. Your corrections are okay. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 10:43 À : user@manifoldcf.apache.org Objet : Re: Documents blocked sometimes without errors If there's any chance

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Ok. I have paused and restart. I have down the agent and restrart. I continue the tests. I have many millions documents, so it will take time. Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 10:43 À : user@manifoldcf.apache.org Objet : Re:

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
If there's any chance these were leftover from before the patch was applied, we should try to eliminate that. To do that: - pause the job - restart the job Then, either wait for the script-based agents process shutdown, or shut down the agents process manually and restart. Do this a number of

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Okay, if you need details I am available. Thanks, Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 10:35 À : user@manifoldcf.apache.org Objet : Re: Documents blocked sometimes without errors These are still indeed blocked. Unfortunately I don't

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
These are still indeed blocked. Unfortunately I don't see any pathway for documents to wind up in such a state. I'll have to look in more depth and get back to you later. Karl On Mon, Jun 18, 2018 at 4:07 AM msaunier wrote: > CSV joined. > > > > Thanks, > > Maxence, > > > > > > > > *De :*

RE: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
CSV joined. Thanks, Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 18 juin 2018 10:02 À : user@manifoldcf.apache.org Objet : Re: Documents blocked sometimes without errors The only way to know if these are truly blocked is to find the document records in

Re: Documents blocked sometimes without errors

2018-06-18 Thread Karl Wright
The only way to know if these are truly blocked is to find the document records in the database and include them here. Thanks, Karl On Mon, Jun 18, 2018 at 3:55 AM msaunier wrote: > Hello Karl, > > > > Today, I have 2 documents blocked on the new trunk version (I think). Can > I verify my

TR: Documents blocked sometimes without errors

2018-06-18 Thread msaunier
Hello Karl, Today, I have 2 documents blocked on the new trunk version (I think). Can I verify my trunk vertion after the build? Thanks, Maxence , De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 5 juin 2018 14:54 À : 'user@manifoldcf.apache.org' Objet : RE: Documents

R: connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
I solved! I executed: sudo ./initialize.sh And connectors have been refreshed! Thanks! Da: Bisonti Mario Inviato: venerdì 15 giugno 2018 11:46 A: user@manifoldcf.apache.org Oggetto: R: connectors.xml modified: new repository not in the list I leave the name jcifs-1.3.19.jar without rename it

R: connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
I leave the name jcifs-1.3.19.jar without rename it because when I used with jetty on HSQLDB it worked. Now I am using MCF 2.10 Thanks Da: msaunier Inviato: venerdì 15 giugno 2018 11:42 A: user@manifoldcf.apache.org Oggetto: RE: connectors.xml modified: new repository not in the list Hello

RE: connectors.xml modified: new repository not in the list

2018-06-15 Thread msaunier
Hello Mario, Your jcifs is named jcifs.jar or jcifs-1.3.19.jar ? What is your ManifoldCF version ? Maxence, De : Bisonti Mario [mailto:mario.biso...@vimar.com] Envoyé : vendredi 15 juin 2018 11:39 À : user@manifoldcf.apache.org Objet : connectors.xml modified: new repository

connectors.xml modified: new repository not in the list

2018-06-15 Thread Bisonti Mario
Hallo. I installed ManifoldCF on Tomcat and with postgres and I am configuring for use with folder /manifoldcf/multiprocess-file-example Now I would like to add repository "Windows Shares" so I decommented in connectors.xml: And I added on connector-lib-proprietary jcifs-1.3.19.jar I

Sharepoint 2013 indexation time performance

2018-06-13 Thread Olivier Tavard
Hi, I have a question regarding the performance of the Sharepoint repository connector. Recently we did some tests using MCF 2.8.1 to crawl some documents on a Sharepoint 2013 server. There were few documents : only 700 all located on the same documents list. For full indexation the indexation

Re: Job in aborting status

2018-06-13 Thread Karl Wright
I'm not in a position to teach you how to use the Java tools, but: (1) You want to use the JDK, and (2) The utility you want to run to get a thread dump is jstack (distributed with the JDK). If you can't attach to the process, there's a switch to force attachment: -F Karl On Wed, Jun 13, 2018

R: Job in aborting status

2018-06-13 Thread Bisonti Mario
Ciao Karl. I am not able to thread dump the start.jar process beacuse I obtain: Error attaching to core file: cannot open binary file sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file . . Furthermore, I set in logging.xml: %5p %d{ISO8601} (%t) -

<    4   5   6   7   8   9   10   11   12   13   >