[jira] [Updated] (CONNECTORS-1522) Add SSL trust certificates list to ElasticSearch output connector

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1522: Fix Version/s: ManifoldCF 2.12 > Add SSL trust certificates list to ElasticSea

[jira] [Assigned] (CONNECTORS-1522) Add SSL trust certificates list to ElasticSearch output connector

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1522: --- Assignee: Karl Wright > Add SSL trust certificates list to ElasticSea

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574706#comment-16574706 ] Karl Wright commented on TIKA-2693: --- Re: testing: I don't have a test setup here, and the user

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574676#comment-16574676 ] Karl Wright commented on CONNECTORS-1490: - [~piergiorgioluc...@gmail.com], it ran correctly

Re: crawl interrupted

2018-08-09 Thread Karl Wright
There is no autovacuum for MySQL. MySQL apparently does dead tuple cleanup as it goes. Karl On Thu, Aug 9, 2018 at 6:13 AM Gustavo Beneitez wrote: > Hi, > > looking at the manifoldCF pom I can see > > 1.0.4-SNAPSHOT > > I'm not aware of any change in database, in fact ours is MySQL, I don't >

[jira] [Assigned] (LUCENE-8451) GeoPolygon test failure

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned LUCENE-8451: --- Assignee: Karl Wright > GeoPolygon test fail

[jira] [Commented] (LUCENE-8451) GeoPolygon test failure

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574582#comment-16574582 ] Karl Wright commented on LUCENE-8451: - [~ivera], I won't have any possibility of looking

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574570#comment-16574570 ] Karl Wright commented on CONNECTORS-1490: - Hi [~piergiorgioluc...@gmail.com], we have

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574552#comment-16574552 ] Karl Wright commented on CONNECTORS-1490: - Ok, thanks. I'm going to try running the IT from

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-08-08 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573935#comment-16573935 ] Karl Wright commented on TIKA-2693: --- I am currently with my wife in the emergency room, so trying things

Re: It's release time again

2018-08-08 Thread Karl Wright
h during this week but I don't know if we have time to directly bring > it in this release. > > In September I hope to bring the new website and Alfresco BFSI and then the > Azure Storage connectors. > > Cheers, > PJ > > Il giorno mar 7 ago 2018 alle ore 14:05 Karl Wright

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-08-08 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573198#comment-16573198 ] Karl Wright commented on TIKA-2693: --- I am being clobbered with Tika/POI issues at the moment so I'm

Re: Job stuck internal http error 500

2018-08-08 Thread Karl Wright
allation and the problem was solved! > > > > Now I solved using the tika 1.19 versions nightly build. > > > > > > Thanks a lot. > > > > > > > > *Da:* Karl Wright > *Inviato:* venerdì 27 luglio 2018 12:39 > *A:* user@manifoldcf.apache.org > *Og

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-08 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573002#comment-16573002 ] Karl Wright commented on CONNECTORS-1521: - There is one hacky approach that would certainly

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-08 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572995#comment-16572995 ] Karl Wright commented on CONNECTORS-1521: - I'm afraid I don't have time to even contemplate

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572690#comment-16572690 ] Karl Wright commented on CONNECTORS-1490: - The only remaining issue is how the tests are run

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572688#comment-16572688 ] Karl Wright commented on CONNECTORS-1490: - ok, moved. > GSOC: MongoDB Output Connec

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571798#comment-16571798 ] Karl Wright commented on CONNECTORS-1490: - Also, build.xml has the following: {code

[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571787#comment-16571787 ] Karl Wright commented on CONNECTORS-1490: - What was the final decision about what version

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571629#comment-16571629 ] Karl Wright commented on CONNECTORS-1521: - {quote} As far as I can see none of the patterns

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571624#comment-16571624 ] Karl Wright commented on CONNECTORS-1521: - [~jamesthomas], computing a date relative to &quo

Re: It's release time again

2018-08-07 Thread Karl Wright
When will it be ready for integration? Karl On Tue, Aug 7, 2018 at 7:10 AM Irindu Nugawela wrote: > Hi Karl, > > I am currently preparing the patch for mcf-mongodb-output-connector. I > would be glad if we can include it in the next release. > > On Mon, 6 Aug 2018 at 16:00,

[jira] [Comment Edited] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571462#comment-16571462 ] Karl Wright edited comment on CONNECTORS-1521 at 8/7/18 11:05 AM

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571462#comment-16571462 ] Karl Wright commented on CONNECTORS-1521: - All I have access to indicates that IDfTime

[jira] [Commented] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571444#comment-16571444 ] Karl Wright commented on CONNECTORS-1521: - The method that is used to build the date string

[jira] [Assigned] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1521: --- Assignee: Karl Wright > Documentum Connector users ManifoldCF's local t

[jira] [Commented] (CONNECTORS-1492) GSOC: Add support for Docker

2018-08-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571312#comment-16571312 ] Karl Wright commented on CONNECTORS-1492: - [~piergiorgioluc...@gmail.com], I suspect that we

Re: PostgreSQL version to support MCF v2.10

2018-08-06 Thread Karl Wright
the first place, or > it is just to be expected with the nature of the multiple worker threads > and the query types issued by ManifoldCF? > > Best Regards, > > > > Guy > > > > *From:* Karl Wright [mailto:daddy...@gmail.com] > *Sent:* 06 August 2018 12:16 > *To:* user@

Re: PostgreSQL version to support MCF v2.10

2018-08-06 Thread Karl Wright
PDATE > > 2018-08-03 15:52:42.855 BST [5716] ERROR: could not serialize access due > to concurrent update > > “ > > > > These errors don’t suggest a retry may sort them out - is this an issue? > > > > Many Thanks, > > > > Guy > > > > *Fr

It's release time again

2018-08-06 Thread Karl Wright
I'm hoping to cut RC0 of 2.11 around August 15th. Any objection? Karl

[jira] [Assigned] (LUCENE-8444) Geo3D Test Failure: Test Point is Contained by shape but outside the XYZBounds

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned LUCENE-8444: --- Assignee: Ignacio Vera (was: Karl Wright) > Geo3D Test Failure: Test Point is Contai

[jira] [Commented] (LUCENE-8444) Geo3D Test Failure: Test Point is Contained by shape but outside the XYZBounds

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570011#comment-16570011 ] Karl Wright commented on LUCENE-8444: - [~ivera] That sounds like the proper fix then. It's exactly

[jira] [Commented] (LUCENE-8444) Geo3D Test Failure: Test Point is Contained by shape but outside the XYZBounds

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569976#comment-16569976 ] Karl Wright commented on LUCENE-8444: - [~ivera], identical cutoff planes are bad news. If we detect

[jira] [Assigned] (LUCENE-8444) Geo3D Test Failure: Test Point is Contained by shape but outside the XYZBounds

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned LUCENE-8444: --- Assignee: Karl Wright > Geo3D Test Failure: Test Point is Contained by shape but outs

Re: PostgreSQL version to support MCF v2.10

2018-08-06 Thread Karl Wright
2018-08-03 15:52:25.218 BST [4140] HINT: The transaction might succeed if retried. <<<<<< ... occur because of concurrent transactions. The transaction is indeed retried when this occurs, so unless your job aborts, you are fine. Karl On Mon, Aug 6, 2018 at 5:49 A

Re: PostgreSQL version to support MCF v2.10

2018-08-06 Thread Karl Wright
6 > and 10. > > Simple to resolve though. > > Steph > > > > > > > On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright wrote: > > Hi Guy, > > > > I use Postgresql 9.6 myself and have found no issues with it. I don't > know about v 10 however. > &g

[jira] [Commented] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569769#comment-16569769 ] Karl Wright commented on CONNECTORS-1517: - Attached a second patch, to be applied

[jira] [Updated] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1517: Attachment: CONNECTORS-1517-2.patch > Documentum Connector uses differ

[jira] [Resolved] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1517. - Resolution: Fixed tentative fix committed: r1837476 > Documentum Connector u

[jira] [Commented] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569756#comment-16569756 ] Karl Wright commented on CONNECTORS-1517: - [~jamesthomas], I've coded a tentative patch

[jira] [Updated] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1517: Attachment: CONNECTORS-1517.patch > Documentum Connector uses differ

[jira] [Commented] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569724#comment-16569724 ] Karl Wright commented on CONNECTORS-1517: - That's unfortunate, because I don't know DQL

[jira] [Commented] (LUCENE-8445) RandomGeoPolygonTest.testCompareBigPolygons() failure

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569594#comment-16569594 ] Karl Wright commented on LUCENE-8445: - It worries me that the detection of identical planes needs

[jira] [Assigned] (LUCENE-8445) RandomGeoPolygonTest.testCompareBigPolygons() failure

2018-08-05 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned LUCENE-8445: --- Assignee: Ignacio Vera > RandomGeoPolygonTest.testCompareBigPolygons() fail

Re: Jetty crash

2018-07-31 Thread Karl Wright
> How can I debug this? Any idea? Jetty have a log file? > > > > Cordialement, > > > > [image: msaunier] > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 31 juillet 2018 15:32 > *À :* user@manifoldcf.

Re: Jetty crash

2018-07-31 Thread Karl Wright
There must be a reason. Karl On Tue, Jul 31, 2018 at 8:18 AM msaunier wrote: > Hello Karl, > > > > Today and yesterday, I have an error with Jetty. Jetty crash for no reason. > > > > Error: > > ./start.sh : ligne 41 : 562 Processus arrêté "$JAVA_HOME/bin/java" > $OPTIONS

Re: Scheduler not working as we expected

2018-07-31 Thread Karl Wright
Hi Vinay, Dynamic rescan is meant for web-crawling and revisits already crawled documents based on how often they have changed in the past. It is therefore wholly inappropriate for something like a file crawl, since directory contents (one of the kinds of documents there are in a file crawl)

[jira] [Commented] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-07-30 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562606#comment-16562606 ] Karl Wright commented on TIKA-2693: --- [~kiwiwings], when is Tika planning to go to POI 4.0.0? > T

[jira] [Updated] (CONNECTORS-1520) Connector registration/deregistration fails when more than a certain number of jobs

2018-07-30 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1520: Attachment: CONNECTORS-1520-2.patch > Connector registration/deregistration fa

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
Ok, attached a second fix. Karl On Mon, Jul 30, 2018 at 4:09 PM Karl Wright wrote: > Yes, of course. I overlooked that. Will fix. > > Karl > > > On Mon, Jul 30, 2018 at 3:54 PM Mike Hugo wrote: > >> That limit only applies to the list of transformations, not the

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
new MultiClause(jobs.idField,jobIDs)})) > .append(" FOR UPDATE"); > <<<<<< > > Which generates a query with a large OR clause > > > Mike > > On Mon, Jul 30, 2018 at 2:44 PM, Karl Wright wrote: > >> The limit is app

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
et set = > database.performQuery(query.toString(),newList,null,null); > int i = 0; > while (i < set.getRowCount()) > { > IResultRow row = set.getRow(i++); > Long jobID = (Long)row.getValue(jobs.idField); > int statusValue = > jobs.stringToS

[jira] [Resolved] (CONNECTORS-1520) Connector registration/deregistration fails when more than a certain number of jobs

2018-07-30 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1520. - Resolution: Fixed r1837084 > Connector registration/deregistration fails when m

[jira] [Updated] (CONNECTORS-1520) Connector registration/deregistration fails when more than a certain number of jobs

2018-07-30 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1520: Attachment: CONNECTORS-1520.patch > Connector registration/deregistration fails w

[jira] [Created] (CONNECTORS-1520) Connector registration/deregistration fails when more than a certain number of jobs

2018-07-30 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1520: --- Summary: Connector registration/deregistration fails when more than a certain number of jobs Key: CONNECTORS-1520 URL: https://issues.apache.org/jira/browse/CONNECTORS-1520

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
es "OR" return getMaxOrClause(); } <<<<<< The problem is that there was a cut-and-paste error, with just transformation connections, that defeated the limit. I'll create a ticket and attach a patch. CONNECTORS-1520. Karl On Mon, Jul 30, 2018 at 2:29 PM Karl Wright

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
loses the > connection before returning with a response. > > As I mentioned this instance of manifold has nearly 40,000 web crawlers. > is that a high number for Manifold to handle? > > On Mon, Jul 30, 2018 at 10:58 AM, Karl Wright wrote: > >> Well, I have absolutely no i

Re: Scheduling Problem and the IBM Domino Connector

2018-07-30 Thread Karl Wright
or API resource to > extract documents from Domino server? > > Best wishes, > Cheng > > On 30 Jul 2018, at 17:48, Karl Wright wrote: > > Hi Cheng, > > Dynamic recrawl revisits documents based on the frequency that they > changed in the past. It is therefore hard t

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
res run on the same host. > > On Mon, Jul 30, 2018 at 9:35 AM, Karl Wright wrote: > >> ' LOG: incomplete message from client' >> >> This shows a network issue. Did your network configuration change >> recently? >> >> Karl >> >> >&

Re: PSQLException: This connection has been closed.

2018-07-30 Thread Karl Wright
.createStatement(PgConnection.java:1576) > at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367) > at org.apache.manifoldcf.core.database.Database.execute(Database.java:873) > at > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) >

Re: Scheduling Problem

2018-07-30 Thread Karl Wright
Hi Cheng, Dynamic recrawl revisits documents based on the frequency that they changed in the past. It is therefore hard to make any prediction about whether a document will be recrawled in a given time interval. You need recrawls of existing directories in order to discover new documents in

Re: PSQLException: This connection has been closed.

2018-07-29 Thread Karl Wright
It looks to me like your database server is not happy. Maybe it's out of resources? Not sure but a restart may be in order. Karl On Sun, Jul 29, 2018 at 9:06 AM Mike Hugo wrote: > Recently we started seeing this error when Manifold CF starts up. We had > been running Manifold CF with many

[jira] [Commented] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2018-07-27 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560364#comment-16560364 ] Karl Wright commented on CONNECTORS-1519: - Can you have a look at what has changed

Re: Exclude files ~$*

2018-07-27 Thread Karl Wright
Can you view the job and include a screen shot of where this is displayed? Thanks. The exclusions are not regexps -- they are file specs. The file specs have special meanings for "*" (matches everything) and "?" (matches one character). You do not need to URL encode them. If you enable

Re: Tika/POI bugs

2018-07-27 Thread Karl Wright
ad I could find it more easily, but I'm > afraid the crawl is very long. > > Maybe you have an idea of ​​the best method to adopt to find this / these > documents? > > > > Maxence > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* vendredi 2

Tika/POI bugs

2018-07-27 Thread Karl Wright
Hi all, I've easily spent 40 hours over the last two weeks chasing down bugs in Apache Tika and POI. The two kinds I see are "ClassNotFound" (due to usage of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due to yet). I don't have enough time to create tickets directly in

Tika/POI bugs

2018-07-27 Thread Karl Wright
Hi all, I've easily spent 40 hours over the last two weeks chasing down bugs in Apache Tika and POI. The two kinds I see are "ClassNotFound" (due to usage of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due to yet). I don't have enough time to create tickets directly in

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
set: > > sudo nano options.env.unix > > -Xms2048m > > -Xmx2048m > > > > But I obtain the same error. > > My doubt is that it could be a solr/tika problem. > > What could I do? > > I restrict the scan to a single file and I obtain the same error > > >

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
Although it is not clear what process you are talking about. If solr ask them. Karl On Fri, Jul 27, 2018, 5:36 AM Karl Wright wrote: > I am presuming you are using the examples. If so, edit the options file > to grant more memory to you agents process by increasing the Xmx value. &g

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario wrote: > Hallo. > > My job is stucking indexing an xlsx file of 38MB > > > > What could I do to

[jira] [Commented] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559269#comment-16559269 ] Karl Wright commented on CONNECTORS-1518: - [~svanschalkwyk], we don't control how much

[jira] [Resolved] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1518. - Resolution: Fixed r1836769 > MCF shutting down when Tika is u

[jira] [Updated] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1518: Attachment: CONNECTORS-1518.patch > MCF shutting down when Tika is u

[jira] [Commented] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559082#comment-16559082 ] Karl Wright commented on CONNECTORS-1518: - Hi [~svanschalkwyk], the memory usage

[jira] [Updated] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1518: Fix Version/s: ManifoldCF 2.11 > MCF shutting down when Tika is u

[jira] [Assigned] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1518: --- Assignee: Karl Wright > MCF shutting down when Tika is u

[jira] [Commented] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559073#comment-16559073 ] Karl Wright commented on CONNECTORS-1191: - Hi [~svanschalkwyk], is there any reason you

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
On Thu, Jul 26, 2018 at 11:09 AM msaunier wrote: > On repository connection. I have add « 20971520 » on the max document size. > > > > Maxence > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* jeudi 26 juillet 2018 17:07 > *À :* us

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
How are you limiting content size? Is this in the repository connection, or in an Allowed Documents transformation connection? Karl On Thu, Jul 26, 2018 at 10:58 AM msaunier wrote: > I have limit to 20Mb / document and I have again an out of memory java. > > > > > > &

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
I believe there's also a content length tab in the Windows Share connector, if you're using that. Karl On Thu, Jul 26, 2018 at 10:19 AM Karl Wright wrote: > The ContentLimiter truncates documents. That's not what you want. > > Use the Allowed Documents transformer. > > Karl &g

Re: ***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
> > Maxence, > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mercredi 25 juillet 2018 19:15 > *À :* user@manifoldcf.apache.org > *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think > > > > It looks like you are still run

Re: Solr connection, max connections and CPU

2018-07-26 Thread Karl Wright
Hi Mario, There is no connection between the number of CPUs and the number output connections. You pick the maximum number of output connections based on the number of listening threads that you can use at the same time in Solr. Karl On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario wrote: >

[jira] [Commented] (CONNECTORS-1516) Class not found exception using Tika transformer

2018-07-26 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558285#comment-16558285 ] Karl Wright commented on CONNECTORS-1516: - Fix committed in Apache POI. But now we see

[jira] [Created] (TIKA-2693) Tika 1.17 uses the wrong classloader for reflection

2018-07-26 Thread Karl Wright (JIRA)
Karl Wright created TIKA-2693: - Summary: Tika 1.17 uses the wrong classloader for reflection Key: TIKA-2693 URL: https://issues.apache.org/jira/browse/TIKA-2693 Project: Tika Issue Type: Bug

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
like you may have put the new poi jars in the wrong place? They should *all* be in connector-common-lib too. Karl On Thu, Jul 26, 2018 at 6:23 AM Karl Wright wrote: > Hi Maxence, > > The following error: > > >>>>>> > > FATAL 2018-07-26T11:30:32,220 (Wo

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright
onal dependencies. > > J2KImageReader not loaded. JPEG2000 files will not be processed. > > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > > for optional dependencies. > > > > juil. 26, 2018 11:29:01 AM > org.apache.tika.config.InitializableProblemHandler$3 &

Re: Create a new ACTIVITY_FETCH from a transformation

2018-07-26 Thread Karl Wright
ed and ingested like others do)?". > > Thanks. > > > > El jue., 26 jul. 2018 a las 0:35, Karl Wright () > escribió: > >> The crawled URL is transmitted as part of the RepositoryDocument object to >> the output connector. If this is going to Solr, it's used as

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright
/httpcomponents-client-ga/tutorial/html/statemgmt.html Karl On Thu, Jul 26, 2018 at 3:19 AM Karl Wright wrote: > Ok, so the database for your site crawl contains both z.com and x.y.z.com > cookies? And your site pages from domain a.y.z.com receive no cookies at > all whe

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright
it should since A.Y.Z is a sub-domain in > Z). > > Only doing that changes by hand (replacing domain with sub-domain in > database) and restarting manifold it begins to work. > > There might be security constrains somehow, I will consider further > analysis. > > Regards. > >

[jira] [Commented] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-07-25 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556937#comment-16556937 ] Karl Wright commented on CONNECTORS-1517: - [~jamesthomas], the connector was developed under

[jira] [Updated] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-07-25 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1517: Fix Version/s: ManifoldCF 2.11 > Documentum Connector uses different "uncon

[jira] [Assigned] (CONNECTORS-1517) Documentum Connector uses different "unconstrained" a_content_type filters depending on whether the Content Types tab has been edited

2018-07-25 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1517: --- Assignee: Karl Wright > Documentum Connector uses different "uncon

Re: Create a new ACTIVITY_FETCH from a transformation

2018-07-25 Thread Karl Wright
The crawled URL is transmitted as part of the RepositoryDocument object to the output connector. If this is going to Solr, it's used as the document's ID. You can therefore customize Solr (or ElasticSearch) to extract the data you need at the indexing end. If this doesn't make any sense to you,

Re: web crawler not sharing cookies

2018-07-25 Thread Karl Wright
; X.Y.Z.com", none of the sub-sites receives that cookie, I need to write > same cookie for every sub-domain, that solves the situation (and > thankfully is a language cookie and not a dynamic one). > > Regards. > > El mié., 25 jul. 2018 a las 19:17, Karl Wright () > escrib

Re: Speed up cleaning up job

2018-07-25 Thread Karl Wright
sed » document are delete very fast > > « Active » documents too. > > But « Documents » on the interface, it’s very slow to delete every lines. > ManifoldCF delete Documents 100 by 100. > > > > Maxence, > > > > > > > > *De :* Karl Wright [mailto:

Re: Speed up cleaning up job

2018-07-25 Thread Karl Wright
I'm sorry, I don't understand your question? Karl On Wed, Jul 25, 2018 at 12:53 PM msaunier wrote: > Hi Karl, > > > > Can I configure ManifoldCF to cleaning up faster ? I think, ManifoldCF > Clean 100 by 100 by default. > > > > Maxence, > > >

Re: web crawler not sharing cookies

2018-07-25 Thread Karl Wright
en the documentation for an example >> of that. >> >> Regards! >> >> El jue., 19 jul. 2018 a las 21:54, Karl Wright () >> escribió: >> >>> You are correct that cookies are not shared among threads. That is by >>> design. >>> &g

***UNCHECKED*** Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
It looks like you are still running out of memory. I would love to know what document it was that doing that. I suspect it is very large already, and for some reason it cannot be streamed. Karl On Wed, Jul 25, 2018 at 1:13 PM Karl Wright wrote: > Hi Maxence, > > The second

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > ~[mcf-pull-agent.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) > ~[?:?] > > at > org.apache.manifoldcf.crawler.sys

Re: Out of memory, one file bug i think

2018-07-25 Thread Karl Wright
gt; at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > ~[mcf-pull-agent.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)

<    13   14   15   16   17   18   19   20   21   22   >