Re: Job stuck without message

2018-11-28 Thread Karl Wright
p/jstack_start_agent.log > > > > but I obtain: > > 1233: Unable to open socket file /proc/1233/cwd/.attach_pid1233: target > process 1233 doesn't respond within 10500ms or HotSpot VM not loaded > > > > Perhaps isn’t it the right way to obtain a thread dump? > >

Re: Job stuck without message

2018-11-28 Thread Karl Wright
Another thing you could do is get a thread dump of the agents process. Karl On Wed, Nov 28, 2018 at 10:35 AM Karl Wright wrote: > Can you look into the database jobqueue table and provide a row that > corresponds to one of these documents? > > Thanks, > Karl > > > On

[jira] [Resolved] (CONNECTORS-1559) Logging Is Not working as expected

2018-11-28 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1559. - Resolution: Not A Problem Assignee: Karl Wright Logging is described

[jira] [Commented] (CONNECTORS-1558) Action Button is Missing in Status Job

2018-11-27 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700885#comment-16700885 ] Karl Wright commented on CONNECTORS-1558: - I'm afraid this report is completely

[jira] [Resolved] (CONNECTORS-1558) Action Button is Missing in Status Job

2018-11-27 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1558. - Resolution: Incomplete > Action Button is Missing in Status

Re: Error Job stop after repeatidly interruption

2018-11-26 Thread Karl Wright
are': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,862 (Worker thread '12') - Service interruption > reported for job 1533797717712 connection '

Re: ManifoldCF Docker MySQL Connection Error

2018-11-24 Thread Karl Wright
e"/> >value="custom_hostname"/> > > So, I've added that properties to make it work. Shouldn't hostname, > dbsuperusername and dbsuperuserpassword be enough? > > Kind Regards, > Furkan KAMACI > > > On Sat, Nov 24, 2018 at 5:40 PM Karl Wr

Re: ManifoldCF Docker MySQL Connection Error

2018-11-24 Thread Karl Wright
-- > -- > Database: amarok > jdbcDriver: com.mysql.jdbc.Driver > jdbcUrl: > jdbc:mysql://localhost/amarok?useUnicode=true=utf8 > userName: manifoldcf > password: local_pg_passwd > -- > > So, it doesn't try to connect a host rather than localhost without >

Re: Manifold fails with alfresco | pipeline exception

2018-11-24 Thread Karl Wright
Hi Nikhilesh, Where are you seeing these errors? They sound like ElasticSearch errors to me; it is complaining that an null or empty-string pipeline name is being specified somehow. Can you tell me what version of ElasticSearch you are using? We have outstanding tickets for updating the

[jira] [Commented] (CONNECTORS-1557) HTML Tag extractor

2018-11-21 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694406#comment-16694406 ] Karl Wright commented on CONNECTORS-1557: - The best way to deliver the code is as a patch

Re: Language Detection for the data

2018-11-21 Thread Karl Wright
Hi Nikita, Can you be more specific when you say "OpenNLP is not working"? All that this connector does is integrate OpenNLP as a ManifoldCF transformer. It uses a specific directory to deliver the models that OpenNLP uses to match and extract content from documents. Thus, you can provide any

[jira] [Assigned] (CONNECTORS-1557) HTML Tag extractor

2018-11-21 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1557: --- Assignee: Karl Wright > HTML Tag extrac

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
ie the website needs to escape itself the special > characters otherwise the extraction will not work in MCF, am I right ? > > Best regards, > > Olivier > > > > Le 15 nov. 2018 à 12:57, Karl Wright a écrit : > > Hi Olivier, > > You can create a ticket but I don't h

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
nk extraction starting > DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: no content > exclusion rule supplied... returning > DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: Decided to > ingest 'http://localhost:/testjs/test.html' > — > So special characters like th

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
apply the same your concept , wait 10 sec and retry > three times , to the 503 error , too? > > > > So, I would like to try, if, with the modification, I obtain that job end > correctly instead of failure. > > > > > > Thanks a lot > > Mario > > >

[jira] [Updated] (CONNECTORS-1556) Integrate changes in retry handling to address TIKA-2776

2018-11-15 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1556: Attachment: CONNECTORS-1556.patch > Integrate changes in retry handling to addr

[jira] [Resolved] (CONNECTORS-1556) Integrate changes in retry handling to address TIKA-2776

2018-11-15 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1556. - Resolution: Fixed r1846627 > Integrate changes in retry handling to address T

[jira] [Created] (CONNECTORS-1556) Integrate changes in retry handling to address TIKA-2776

2018-11-15 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1556: --- Summary: Integrate changes in retry handling to address TIKA-2776 Key: CONNECTORS-1556 URL: https://issues.apache.org/jira/browse/CONNECTORS-1556 Project

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
t > (in my case manifoldcf) > > > https://issues.apache.org/jira/browse/TIKA-2776?focusedCommentId=16686620=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16686620 > > > > I am not able to do this… > > Is it possible to implement on the MCF so

Re: Job stuck - WorkerThread functions return null

2018-11-14 Thread Karl Wright
; QueuedDocument qd = previousDocuments.get(documentIdentifierHash); > // return null. The problem is here. > if (qd == null) > throw new IllegalArgumentException("Unrecognized document > identifier: '"+documentIdentifier+"'"); > r

Re: Valid usecase of CredSSP auth scheme

2018-11-12 Thread Karl Wright
Hi Michael, I did not contribute this work; I merely obliquely helped integrating it. If I recall correctly, there was a reasonable case made for it, but I don't remember what it wasy. Karl On Mon, Nov 12, 2018 at 5:50 PM Michael Osipov wrote: > Guys, > > I just have discovered that CredSSP

Re: Job stuck - WorkerThread functions return null

2018-11-12 Thread Karl Wright
Hi, Have you been modifying the framework code? If so, I really cannot help you. If you haven't -- it looks like you've got code that is injecting document identifiers that are incorrect. But I will need to see a full stack trace to be sure of that. Thanks, Karl On Mon, Nov 12, 2018 at 4:06

Re: Error Job stop after repeatidly interruption

2018-11-08 Thread Karl Wright
Hi Mario, The Tika external connector retries for a while before it gives up and aborts the job. If you can get the Tika server back up within a reasonable period of time all should be well. But if one specific document *always* brings down the Tika server, it will be hard to recover from that.

[jira] [Resolved] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1554. - Resolution: Cannot Reproduce > Job stuck during crawl documents on fol

[jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678101#comment-16678101 ] Karl Wright commented on CONNECTORS-1554: - [~bisontim], there are several approved models

[jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677450#comment-16677450 ] Karl Wright commented on CONNECTORS-1554: - Note that if you perform the lock-clean procedure

[jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676928#comment-16676928 ] Karl Wright commented on CONNECTORS-1554: - Hi [~bisontim], you are using file

Re: Job stuck without message

2018-11-06 Thread Karl Wright
> the simple history for one of these documents; I need to see what happened > to it last. > > > > Thanks, > > Karl > > > > > > On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario > wrote: > > My version is 2.11 > > > > > > > > &g

[jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676820#comment-16676820 ] Karl Wright commented on CONNECTORS-1554: - Hi [~bisontim], I note the following in your log

[jira] [Commented] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676817#comment-16676817 ] Karl Wright commented on CONNECTORS-1554: - Hi [~bisontim], I need the Simple History of one

[jira] [Assigned] (CONNECTORS-1554) Job stuck during crawl documents on folder

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1554: --- Assignee: Karl Wright > Job stuck during crawl documents on fol

Re: Job stuck without message

2018-11-06 Thread Karl Wright
ok, can you create a ticket? Also, I'd appreciate it if you can look at the simple history for one of these documents; I need to see what happened to it last. Thanks, Karl On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario wrote: > My version is 2.11 > > > > > > >

[jira] [Resolved] (CONNECTORS-1553) Upgrade to SolrJ 6.6.5

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1553. - Resolution: Won't Fix > Upgrade to SolrJ 6.

[jira] [Commented] (CONNECTORS-1553) Upgrade to SolrJ 6.6.5

2018-11-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676329#comment-16676329 ] Karl Wright commented on CONNECTORS-1553: - [~kamaci], we updated to SolrJ 7.4.x for release

Re: Welcome Tim Allison as a Lucene/Solr committer

2018-11-05 Thread Karl Wright
Welcome! Karl On Mon, Nov 5, 2018 at 1:39 PM Christine Poerschke (BLOOMBERG/ LONDON) < cpoersc...@bloomberg.net> wrote: > Welcome Tim! > > From: dev@lucene.apache.org At: 11/02/18 16:20:52 > To: dev@lucene.apache.org > Subject: Welcome Tim Allison as a Lucene/Solr committer > > Hi all, > > >

Re: Welcome Gus Heck as Lucene/Solr committer

2018-11-02 Thread Karl Wright
Welcome!! Karl On Thu, Nov 1, 2018 at 9:53 PM Koji Sekiguchi wrote: > Welcome Gus! > > Koji > > On 2018/11/01 21:22, David Smiley wrote: > > Hi all, > > > > Please join me in welcoming Gus Heck as the latest Lucene/Solr committer! > > > > Congratulations and Welcome, Gus! > > > > Gus, it's

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-11-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672605#comment-16672605 ] Karl Wright commented on CONNECTORS-1546: - I didn't see a commit go by. Were you able

[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672435#comment-16672435 ] Karl Wright commented on CONNECTORS-1552: - Looks good, but I'd suggest making sure the text

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672425#comment-16672425 ] Karl Wright commented on CONNECTORS-1529: - As long as it's a new field, seems that backwards

[jira] [Commented] (LUCENE-8540) Geo3d quantization test failure for MAX/MIN encoding values

2018-10-31 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670641#comment-16670641 ] Karl Wright commented on LUCENE-8540: - [~ivera] Looks reasonable as far as I can tell. The question

Re: Job stuck without message

2018-10-30 Thread Karl Wright
; > > Solr server is ok > > Tika server is ok > > Agent is ok > > Tomcat with ManifoldCF is ok > > > > I could search if I could to put in info log mode for example Tika servrer > or Solr. > > > > Thanks.. > > > > > > *Da:* Ka

Re: Job stuck without message

2018-10-30 Thread Karl Wright
Yes, I see many docs in the docs queue but they are inactive. > > > > Infact i see that no more docs are indexed in Solr and I see that job is > with the same number of docs Active (35012) > > > > > > > > > > *Da:* Karl Wright > *Inviato:* martedì 30 otto

Re: Job stuck without message

2018-10-30 Thread Karl Wright
The reason the job is "stuck" is because: ' JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.' This means that ManifoldCF will retry this document for a while before it gives up on it. It appears to be stuck but it is not. You

[jira] [Assigned] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1552: --- Assignee: Steph van Schalkwyk (was: Karl Wright) > Apache ManifoldCF Elas

Re: [jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Karl Wright
Remcam Search Engines > +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net > <http://www.remcam.net/> Skype: svanschalkwyk > <https://mail.google.com/mail/u/0/#> > <http://linkedin.com/in/vanschalkwyk> > > > On Mon, Oct 29, 2018 a

[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667589#comment-16667589 ] Karl Wright commented on CONNECTORS-1552: - The ES connector does not currently support any

[jira] [Assigned] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1552: --- Assignee: Karl Wright Priority: Major (was: Blocker) Fix

Re: web connector : links extraction issues

2018-10-29 Thread Karl Wright
Hi Olivier, Javascript inclusion in the Web Connector is not evaluated. In fact, no Javascript is executed at all. Therefore it should not matter what is included via javascript. Thanks, Karl On Mon, Oct 29, 2018 at 1:39 PM Olivier Tavard < olivier.tav...@francelabs.com> wrote: > Hi, > >

Re: ManifoldCF database model

2018-10-29 Thread Karl Wright
on logs without luck. > > I think it could be 2) case, can I increase log detail for web repository? > This, and the Elastic, are both default connectors, no code changes here. > > Thanks. > > El lun., 29 oct. 2018 a las 16:12, Karl Wright () > escribió: > > > It

Re: ManifoldCF database model

2018-10-29 Thread Karl Wright
Maybe if they share repo? > > Thanks in advance! > > > El mié., 17 oct. 2018 a las 14:40, Gustavo Beneitez (< > gustavo.benei...@gmail.com>) escribió: > > > Ok thanks! > > > > El mié., 17 oct. 2018 a las 14:27, Karl Wright () > > escribió: > > &

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
Never mind, I was able to get it fixed. Karl On Wed, Oct 24, 2018 at 10:19 AM Karl Wright wrote: > I've created CONNECTORS-1551, and attached the patch. > > Unfortunately there seems to be some encoding issues with > common_ja_JP.properties; can you send that one fi

[jira] [Resolved] (CONNECTORS-1551) Various confluence connector issues

2018-10-24 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1551. - Resolution: Fixed r1844778 > Various confluence connector iss

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
I've created CONNECTORS-1551, and attached the patch. Unfortunately there seems to be some encoding issues with common_ja_JP.properties; can you send that one file via email as an attachment? Thanks! Karl On Tue, Oct 23, 2018 at 8:54 PM 白井 隆/ Shirai Takashi wrote: > Hi, there. > > I've just

[jira] [Updated] (CONNECTORS-1551) Various confluence connector issues

2018-10-24 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1551: Attachment: CONNECTORS-1551.patch > Various confluence connector iss

[jira] [Created] (CONNECTORS-1551) Various confluence connector issues

2018-10-24 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1551: --- Summary: Various confluence connector issues Key: CONNECTORS-1551 URL: https://issues.apache.org/jira/browse/CONNECTORS-1551 Project: ManifoldCF Issue

Re: How documents are deleted

2018-10-24 Thread Karl Wright
Hi Julien, This is a complex question and the framework behaves differently depending on the connector model. Please read: https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs Karl On Wed, Oct 24, 2018 at 5:26 AM Julien Massiera < julien.massi...@francelabs.com> wrote: > Hi Karl,

Re: error when running jobs

2018-10-24 Thread Karl Wright
need or maybe we have also to increase general log level. > > Thanks in advance. > > > El mar., 23 oct. 2018 a las 14:28, Gustavo Beneitez (< > gustavo.benei...@gmail.com>) escribió: > >> Thanks Karl, we are going to make new crawls with that property enable >>

[jira] [Commented] (LUCENE-8540) Geo3d quantization test failure for MAX/MIN encoding values

2018-10-23 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660515#comment-16660515 ] Karl Wright commented on LUCENE-8540: - Hi [~ivera], can you have a look at this? I'm quite busy

[jira] [Assigned] (LUCENE-8540) Geo3d quantization test failure for MAX/MIN encoding values

2018-10-23 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned LUCENE-8540: --- Assignee: Ignacio Vera > Geo3d quantization test failure for MAX/MIN encoding val

Re: error when running jobs

2018-10-23 Thread Karl Wright
On Tue, Oct 23, 2018 at 2:34 AM Gustavo Beneitez wrote: > I Karl, > > MySQL. As per config variables: > version 5.7.23-log > version comment MySQL Community Server (GPL) > > which file should I enable logging/debugging? > > Thanks! > > El lun., 22 oct. 2018 a las

Re: error when running jobs

2018-10-22 Thread Karl Wright
Hi Gustavo, I have seen this error before; it is apparently due to the database failing to properly gate transactions and behave according to the concurrency model selected for a transaction. We have a debugging setting you can configure which logs the needed information so that forensics get

[jira] [Resolved] (CONNECTORS-1550) HTML Tag mapping

2018-10-19 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1550. - Resolution: Not A Problem Hi [~DonaldVdD], please post questions like

[jira] [Updated] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1549: Attachment: CONNECTORS-1549.patch > Include and exclude rules order l

[jira] [Resolved] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1549. - Resolution: Fixed r1844293 > Include and exclude rules order l

[jira] [Updated] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1549: Fix Version/s: ManifoldCF 2.12 > Include and exclude rules order l

[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656073#comment-16656073 ] Karl Wright commented on CONNECTORS-1549: - I found the issue and have attached a patch

[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655986#comment-16655986 ] Karl Wright commented on CONNECTORS-1549: - Hi [~julienFL] Sorry for the delay. First note

[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655223#comment-16655223 ] Karl Wright commented on CONNECTORS-1549: - Hi [~julienFL], there was a similar ticket

[jira] [Assigned] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1549: --- Assignee: Karl Wright > Include and exclude rules order l

[jira] [Updated] (CONNECTORS-1548) CMIS output connector test fails with versioning state error

2018-10-17 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1548: Description: While working on the upgrade to Tika 1.19.1, I ran into CMIS output

[jira] [Created] (CONNECTORS-1548) CMIS output connector test fails with versioning state error

2018-10-17 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1548: --- Summary: CMIS output connector test fails with versioning state error Key: CONNECTORS-1548 URL: https://issues.apache.org/jira/browse/CONNECTORS-1548 Project

[jira] [Resolved] (CONNECTORS-1547) No activity record for for excluded documents in WebCrawlerConnector

2018-10-17 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1547. - Resolution: Fixed r1844120 > No activity record for for excluded docume

[jira] [Updated] (CONNECTORS-1547) No activity record for for excluded documents in WebCrawlerConnector

2018-10-17 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1547: Fix Version/s: ManifoldCF 2.12 > No activity record for for excluded docume

[jira] [Assigned] (CONNECTORS-1547) No activity record for for excluded documents in WebCrawlerConnector

2018-10-17 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1547: --- Assignee: Karl Wright > No activity record for for excluded docume

Re: ManifoldCF database model

2018-10-17 Thread Karl Wright
cies exist. > > Please allow us to check those documents and make new tests in order to see > what really happens,we don't modify any database record by hand. > > Thanks! > > > > > > > > El mar., 16 oct. 2018 a las 19:27, Karl Wright () >

Re: ManifoldCF database model

2018-10-16 Thread Karl Wright
Hi, you can look at ManifoldCF In Action. There's a link to it on the manifoldcf page. However, you should be aware that we consider it a severe bug if ManifoldCF doesn't clean up after itself. The only time that is not expected is when people write buggy connectors or mess with database tables

Re: Create documents from transformation connector

2018-10-16 Thread Karl Wright
Hi Julien, That is one thing you cannot do with the MCF pipeline. All documents must originate in a RepositoryConnector. The repository connector can create multiple subdocuments itself, if need be, but the rest of the pipeline does not allow further splitting. One way around this: If the

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-10-16 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651950#comment-16651950 ] Karl Wright commented on CONNECTORS-1546: - I agree with your decision. > Optim

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-10-16 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651761#comment-16651761 ] Karl Wright commented on CONNECTORS-1546: - Hi [~st...@remcam.net], can you comment

[jira] [Assigned] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-10-16 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1546: --- Assignee: Steph van Schalkwyk > Optimize Elasticsearch performance by remov

[jira] [Resolved] (CONNECTORS-1545) Parentheses in editConfiguration tab labels not supported

2018-10-12 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1545. - Resolution: Fixed Fix Version/s: ManifoldCF 2.12 > Parenthe

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
; > > > > I read other discussion ( > https://lists.apache.org/thread.html/66a3f9780bbcc98e404e25f5a0e56a8a6c007448642c3bc15a366ed2@%3Cuser.manifoldcf.apache.org%3E) > but I don’t understand if they solved the issue > > > > ☹ > > > > Thanks a lot. >

Re: Logging and Document filter transformation connector

2018-10-11 Thread Karl Wright
Hi Olivier, The Repository connector has no knowledge of what the pipeline looks like. It simply asks the framework whether the mime type, length, etc. is acceptable to the downstream pipeline. It's the connector's responsibility to note the reason for the rejection in the simple history, but it

Re: Debug logging properties location

2018-10-11 Thread Karl Wright
Hi Olivier, it sounds like you are using Zookeeper. Certain properties are global and are imported into Zookeeper. Other properties are local and found in each local properties.xml file. The debug properties for logging is, I believe, global. Karl On Thu, Oct 11, 2018 at 8:39 AM Olivier

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
> > > Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types > specified? > > > > I am using a snapshot version of ManifoldCF of three monts ago. > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 11 ottobre 2018

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
ly and the "use extracting update handler" box is UNCHECKED. Thanks, Karl On Thu, Oct 11, 2018 at 8:16 AM Karl Wright wrote: > When you uncheck the "use extracting update handler" checkbox, the Solr > connection only accepts text/plain, and no binary formats. The

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
see them? > > > > Perhaps is the “Ignore Tika exception that I don’t know where to set in > ManifoldCF the problem? > > > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 11 ottobre 2018 12:24 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re

[jira] [Commented] (CONNECTORS-1545) Parentheses in editConfiguration tab labels not supported

2018-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644282#comment-16644282 ] Karl Wright commented on CONNECTORS-1545: - Hi [~julienFL], all strings are escaped, and tabs

[jira] [Assigned] (CONNECTORS-1545) Parentheses in editConfiguration tab labels not supported

2018-10-09 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-1545: --- Assignee: Karl Wright > Parentheses in editConfiguration tab lab

Re: Option to skip documents

2018-10-09 Thread Karl Wright
r1843343 adds this condition to the list of caught conditions. In the future it would be better to create a ticket. Karl On Tue, Oct 9, 2018 at 3:06 PM Karl Wright wrote: > I can make it retry then skip if it doesn't succeed in a while. > > Karl > > > On Tue, Oct 9, 2018 a

Re: Option to skip documents

2018-10-09 Thread Karl Wright
e file several times in a row, gives up > after several tries and stops the jobs with a message reporting the smb > Exception encountered. > > Thanks for your answer, > Romaric > > So it is indeed a temporary lock, but we can't tell how long it will last. > > Le 09/10/

Re: Option to skip documents

2018-10-09 Thread Karl Wright
Hi Romaric, If the error is transient, then the right thing to do is *not* to skip the file, but to retry later. What currently happens? Karl On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > Along the lines of this ticket >

Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Karl Wright
Excellent news! Thanks for the update. Karl On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar wrote: > Thank you so much Karl. I was able to crawl the site and index them. > > On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > >> Please read the user documentation for the sh

[jira] [Commented] (LUCENE-8522) Spatial: Polygon touching the negative boundaries of WGS84 fails on Solr

2018-10-08 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641762#comment-16641762 ] Karl Wright commented on LUCENE-8522: - [~ivera], looks good to me. > Spatial: Polygon touch

Re: Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Karl Wright
If you want all the documents for a specific job, the query is: select count(*) from jobqueue where jobid= Karl On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > > I am currently facing the need of getting the number of documents >

[jira] [Resolved] (CONNECTORS-1541) Documents updated in Google Drive are send with 0 byte to CMIS Output Connector

2018-10-07 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1541. - Resolution: Fixed Assignee: Karl Wright (was: Piergiorgio Lucidi

[jira] [Resolved] (CONNECTORS-1540) When the folder name contains "/", the export api request should replace it with the valid "%2F" before sending it

2018-10-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1540. - Resolution: Cannot Reproduce > When the folder name contains "/", t

[jira] [Comment Edited] (CONNECTORS-1541) Documents updated in Google Drive are send with 0 byte to CMIS Output Connector

2018-10-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640919#comment-16640919 ] Karl Wright edited comment on CONNECTORS-1541 at 10/7/18 12:04 AM

[jira] [Resolved] (CONNECTORS-1543) Folders and documents containing illegal characters in the names break the migration process into CMIS

2018-10-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1543. - Resolution: Fixed Assignee: Karl Wright Fix Version/s: ManifoldCF

[jira] [Commented] (CONNECTORS-1541) Documents updated in Google Drive are send with 0 byte to CMIS Output Connector

2018-10-06 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640919#comment-16640919 ] Karl Wright commented on CONNECTORS-1541: - If you can't find what commit is missing, just

<    9   10   11   12   13   14   15   16   17   18   >