Build failed in Jenkins: ManifoldCF-ant #692

2019-05-10 Thread Apache Jenkins Server
See 


Changes:

[kwright] Fix for CONNECTORS-1605

--
[...truncated 1.02 MB...]

test-lib:

build:

deliver-framework:
 [copy] Copying 2 files to 

 [copy] Copying 4 files to 

 [copy] Copying 3 files to 

 [copy] Copying 7 files to 

 [copy] Copying 7 files to 

 [copy] Copying 9 files to 

 [copy] Copying 9 files to 

 [copy] Copying 5 files to 

 [copy] Copying 6 files to 

 [copy] Copying 2 files to 

 [copy] Copying 2 files to 

 [copy] Copying 2 files to 


download-connectors-dependencies:

download-dependencies:

download-dependencies:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/com/github/maoo/indexer/alfresco-indexer-webscripts-war/0.8.1/alfresco-indexer-webscripts-war-0.8.1.war
  [get] To: 


download-alfresco-ws-client:
  [get] Getting: 
https://artifacts.alfresco.com/nexus/service/local/repositories/releases/content/org/alfresco/alfresco-web-service-client/4.2.c/alfresco-web-service-client-4.2.c.jar
  [get] To: 


download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:

setup-maven-url:

download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/org/apache/chemistry/opencmis/chemistry-opencmis-server-inmemory/1.1.0/chemistry-opencmis-server-inmemory-1.1.0.war
  [get] To: 


download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:
[mkdir] Created dir: 

  [get] Getting: 
http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.1.tar.gz
  [get] To: 

   [gunzip] Expanding 

 to 

[untar] Expanding: 

 into 

[mkdir] Created dir: 

  [get] Getting: 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/archive/v2.0.0.RC1.zip
  [get] To: 

  [get] 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/archive/v2.0.0.RC1.zip
 permanently moved to 
https://github.com/elastic/elasticsearch-mapper-attachments/archive/v2.0.0.RC1.zip
  [get] 
https://github.com/elastic/elasticsearch-mapper-attachments/archive/v2.0.0.RC1.zip
 moved to 
https://codeload.github.com/elastic/elasticsearch-mapper-attachments/zip/v2.0.0.RC1
[unzip] Expanding: 

 into 


download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:

download-dependencies:


Jenkins build is back to normal : ManifoldCF-mvn #708

2019-05-10 Thread Apache Jenkins Server
See 




[jira] [Resolved] (CONNECTORS-1605) Update HTML Extractor connector

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1605.
-
   Resolution: Fixed
Fix Version/s: ManifoldCF 2.14

> Update HTML Extractor connector
> ---
>
> Key: CONNECTORS-1605
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1605
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.13
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.14
>
> Attachments: fix_englobing_tag_selection.txt, global_patch.txt, 
> html_extractor_transformation_connector.txt, 
> patch_HTML_extractor_connector_05_06_19.txt, 
> patch_html_extractor_08_14_18.txt, patch_html_extractor_fix_logs_08_10_18.txt
>
>
> Hi,
> I developed a transformation connector based on Jsoup. The goal of this code 
> is to simply choose an encompassing tag in a HTML document for text 
> extracting. And inside this tag, this connector allows you to remove subparts 
> that you do no want : all the tags corresponding to declared types or 
> specific attribute tag names for example.
> The code is in Apache V2 licence  and it is in attachment.
> It needs some work including code refactoring, renaming classes, unit tests 
> that I will be able to do if you are interested by the code.
> The documentation is here :
> [https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]<[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]
>  
> It does not use additional libraries that the ones already present in MCF 
> project. It is based on Jsoup library on lib folder.
> Best regards,
> Olivier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1574) Performance tuning of manifold

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1574.
-
Resolution: Fixed

No response from user; closing.


> Performance tuning of manifold
> --
>
> Key: CONNECTORS-1574
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector, JCIFS connector, Solr 6.x 
> component
>Affects Versions: ManifoldCF 2.5
> Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>Reporter: balaji
>Assignee: Karl Wright
>Priority: Critical
>  Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of 
> data. we are currently having 450-500 jobs which needs to run simultaneously. 
> We need to index json data and we are using connector type as *file system* 
> along with *postgres* as backend database. 
> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 6 documents was getting indexed in 15minutes but 
> now even a directory path having 5 documents takes 20 minutes or sometimes 
> doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes 
> and on seeing the pg_stat_activity we observe that 2 queries are in waiting 
> state state because of which the page doesn't load. so if we kill those 
> queries or restart manifold the issue gets resolved and the page loads 
> properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE 
> (STATUS=$1 OR STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 
> WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is 
> scheduling of jobs which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a 
> robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1566) Develop CSWS connector as a replacement for deprecated LiveLink LAPI connector

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1566:

Fix Version/s: (was: ManifoldCF 2.13)
   ManifoldCF 2.14

> Develop CSWS connector as a replacement for deprecated LiveLink LAPI connector
> --
>
> Key: CONNECTORS-1566
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1566
> Project: ManifoldCF
>  Issue Type: Task
>  Components: LiveLink connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> LAPI is being deprecated.  We need to develop a replacement for it using the 
> ContentServer Web Services API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1591) RTF comment parsing problem

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1591:

Fix Version/s: (was: ManifoldCF 2.13)
   ManifoldCF 2.14

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1508) Add support for French Language

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1508:

Fix Version/s: (was: ManifoldCF 2.13)
   ManifoldCF 2.14

> Add support for French Language
> ---
>
> Key: CONNECTORS-1508
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1508
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: ManifoldCF 2.10
>Reporter: Cedric Ulmer
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.14
>
> Attachments: cedricmanifold_fr.zip
>
>
> Some users may need a French version of the ressource bundle. I attached a 
> preliminary translation that France Labs made some time ago (probably around 
> summer 2016), but that we halted due to lack of time (and priority). It is 
> probably almost complete, but some quality checking needs to be done. Note 
> also that I forgot to check the version when I did the translations, so 
> anyone interested would need to check any modifications that may have 
> occurred between this version and the current MCF version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1519:

Fix Version/s: (was: ManifoldCF 2.13)
   ManifoldCF 2.14

> CLIENTPROTOCOLEXCEPTION   is thrown with 2.10 -> ES 6.x.y
> ---
>
> Key: CONNECTORS-1519
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1519
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> Investigating CLIENTPROTOCOLEXCEPTION when using 2.10 with ES 6.x.y
> More information to follow.
> Fails when using security , i.e. 
> [http://user:password@elasticsearch:9200.|http://user:password@elasticsearch:9200./]
> Remedy:
>  # Disable x-pack security.
>  # Use http://elasticsearch:9200.
>  
>  
> |07-27-2018 17:53:19.010|Indexation 
> (ES)|file:/var/manifoldcf/corpus/14.html|CLIENTPROTOCOLEXCEPTION|38053|23|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1521:

Fix Version/s: (was: ManifoldCF 2.13)
   ManifoldCF 2.14

> Documentum Connector users ManifoldCF's local time in queries constraints 
> against the Documentum server without reference to time zones
> ---
>
> Key: CONNECTORS-1521
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1521
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> I find that the time/date constraints in queries to the Documentum server are 
> based on the "raw" local time of the ManifoldCF server but appear to take no 
> account of the time zones of the two servers.
> This can lead to recently modified files not being transferred to the output 
> repository when you would naturally expect them to be. I'd like the times to 
> be aligned, perhaps by including time zone in the query. In particular, is 
> there a way to use UTC perhaps?
> Here's an example ...
>  * create a folder in Documentum
>  * set up a job to point at the folder and output to the file system
>  * put two documents into a folder in Documentum
>  * Select them, right click and export as CSV (to show the timestamps):
> {noformat}
> 1.png,48489.0,Portable Network Graphics,8/7/2018 9:04 AM,
> 2.png,28620.0,Portable Network Graphics,8/7/2018 9:04 AM,,{noformat}
> Check the local time on the ManifoldCF server machine. Observe that it's 
> reporting consistent time with the DM server:
> {noformat}
> [james@manifold]$ date
> Tue Aug  7 09:07:25 BST 2018{noformat}
> Start the job and look for the query to Documentum in the manifoldcf.log file 
> (line break added for readability):
> {noformat}
> DEBUG 2018-08-07T08:07:47.297Z (Startup thread) - DCTM: About to execute 
> query= (select for READ distinct i_chronicle_id from dm_document where 
> r_modify_date >= date('01/01/1970 00:00:00','mm/dd/ hh:mi:ss') and
> r_modify_date<=date('08/07/2018 08:07:34','mm/dd/ hh:mi:ss') 
> AND (i_is_deleted=TRUE Or (i_is_deleted=FALSE AND a_full_text=TRUE AND 
> r_content_size>0)) AND ( Folder('/Administrator/james', DESCEND) ))
> ^C{noformat}
> Notice that the latest date asked for is *before* the modification date of 
> the files added to DM. (And is an hour out, see footnote.)
>   
>  See whether anything has been output by the File System connector. It hasn't:
> {noformat}
> [james@manifold]$ ls /bigdisc/source/PDFs/timezones/
> [james@manifold]$
> {noformat}
> Now:
>  * change the timezone on the ManifoldCF server machine
>  * restart the ManifoldCF server and the Documentum processes
>  * reseed the job
> Check the local time on the ManifoldCF server machine; it has changed:
> {noformat}
> [james@manifold]$ date
> Tue Aug  7 10:10:29 CEST 2018{noformat}
> Start the job again and notice that the query has changed by an hour, plus 
> the few minutes it took to change the date etc (and is still an hour out, see 
> footnote):
> {noformat}
> r_modify_date<=date('08/07/2018 09:11:02','mm/dd/ hh:mi:ss') 
> {noformat}
> Observe that the range of dates now covers the timestamps on the DM data, and 
> also that some data has now been transferred by the File System connector:
> {noformat}
> [james@manifold]$ ls 
> /bigdisc/source/PDFs/timezones/http/mfserver\:8080/da/component/
> drl?versionLabel=CURRENT=09018000e515
> drl?versionLabel=CURRENT=09018000e516
> {noformat}
>  
>  
> [Footnote] It appears that something is trying to take account of Daylight 
> Saving Time too.
> If I set the server date to a time outside of DST, the query is aligned with 
> the current time:
> {noformat}
> [i2e@i2ehost manifold]$ date
>  Mon Oct 29 00:01:13 CET 2018
> r_modify_date<=date('10/29/2018 00:01:39','mm/dd/ hh:mi:ss') 
> {noformat}
> But if I set the time inside DST, the time is an hour before:
> {noformat}
> [i2e@i2ehost manifold]$ date
>  Sat Oct 27 00:00:06 CEST 2018
> r_modify_date<=date('10/26/2018 23:00:26','mm/dd/ hh:mi:ss') 
> {noformat}
> This is perhaps a Java issue rather than a logic issue in the connector? See 
> e.g. [https://stackoverflow.com/questions/6392/java-time-zone-is-messed-up]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1605) Update HTML Extractor connector

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1605:

Affects Version/s: (was: ManifoldCF 2.9.1)
   ManifoldCF 2.13

> Update HTML Extractor connector
> ---
>
> Key: CONNECTORS-1605
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1605
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.13
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.10
>
> Attachments: fix_englobing_tag_selection.txt, global_patch.txt, 
> html_extractor_transformation_connector.txt, 
> patch_HTML_extractor_connector_05_06_19.txt, 
> patch_html_extractor_08_14_18.txt, patch_html_extractor_fix_logs_08_10_18.txt
>
>
> Hi,
> I developed a transformation connector based on Jsoup. The goal of this code 
> is to simply choose an encompassing tag in a HTML document for text 
> extracting. And inside this tag, this connector allows you to remove subparts 
> that you do no want : all the tags corresponding to declared types or 
> specific attribute tag names for example.
> The code is in Apache V2 licence  and it is in attachment.
> It needs some work including code refactoring, renaming classes, unit tests 
> that I will be able to do if you are interested by the code.
> The documentation is here :
> [https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]<[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]
>  
> It does not use additional libraries that the ones already present in MCF 
> project. It is based on Jsoup library on lib folder.
> Best regards,
> Olivier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1605) Update HTML Extractor connector

2019-05-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1605:

Fix Version/s: (was: ManifoldCF 2.10)

> Update HTML Extractor connector
> ---
>
> Key: CONNECTORS-1605
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1605
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.13
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: fix_englobing_tag_selection.txt, global_patch.txt, 
> html_extractor_transformation_connector.txt, 
> patch_HTML_extractor_connector_05_06_19.txt, 
> patch_html_extractor_08_14_18.txt, patch_html_extractor_fix_logs_08_10_18.txt
>
>
> Hi,
> I developed a transformation connector based on Jsoup. The goal of this code 
> is to simply choose an encompassing tag in a HTML document for text 
> extracting. And inside this tag, this connector allows you to remove subparts 
> that you do no want : all the tags corresponding to declared types or 
> specific attribute tag names for example.
> The code is in Apache V2 licence  and it is in attachment.
> It needs some work including code refactoring, renaming classes, unit tests 
> that I will be able to do if you are interested by the code.
> The documentation is here :
> [https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]<[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]
>  
> It does not use additional libraries that the ones already present in MCF 
> project. It is based on Jsoup library on lib folder.
> Best regards,
> Olivier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1605) Update HTML Extractor connector

2019-05-10 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1605:
---

 Summary: Update HTML Extractor connector
 Key: CONNECTORS-1605
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1605
 Project: ManifoldCF
  Issue Type: Improvement
Affects Versions: ManifoldCF 2.9.1
Reporter: Olivier Tavard
Assignee: Karl Wright
 Fix For: ManifoldCF 2.10
 Attachments: fix_englobing_tag_selection.txt, global_patch.txt, 
html_extractor_transformation_connector.txt, 
patch_HTML_extractor_connector_05_06_19.txt, patch_html_extractor_08_14_18.txt, 
patch_html_extractor_fix_logs_08_10_18.txt

Hi,

I developed a transformation connector based on Jsoup. The goal of this code is 
to simply choose an encompassing tag in a HTML document for text extracting. 
And inside this tag, this connector allows you to remove subparts that you do 
no want : all the tags corresponding to declared types or specific attribute 
tag names for example.
The code is in Apache V2 licence  and it is in attachment.

It needs some work including code refactoring, renaming classes, unit tests 
that I will be able to do if you are interested by the code.
The documentation is here :

[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]<[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]

 

It does not use additional libraries that the ones already present in MCF 
project. It is based on Jsoup library on lib folder.

Best regards,

Olivier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)