[jira] [Commented] (CONNECTORS-1622) Upgrade to Tika 1.23

2020-01-26 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024054#comment-17024054
 ] 

Karl Wright commented on CONNECTORS-1622:
-

In case anyone's curious what the proper process is for this, what I do is 
change the pom.xml version for Tika first, then do the following steps:

{code}
mvn -DskipTests -DskipITs install
mvn dependency:tree
{code}

Then I go through the tika dependency tree to update all the packages, add any 
that are missing, etc., in the main build.xml file.


> Upgrade to Tika 1.23
> 
>
> Key: CONNECTORS-1622
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1622
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Tika extractor
>Affects Versions: ManifoldCF 2.13
>Reporter: Cihad Guzel
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Tika has released 1.23. Changes can be found from here: 
> http://tika.apache.org/1.23/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1634) Occured 501 HTTP Error while downloading 'h2' dependency

2020-01-25 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1634.
-
Resolution: Fixed

r1873157


> Occured 501 HTTP Error while downloading 'h2' dependency
> 
>
> Key: CONNECTORS-1634
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1634
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.15
>Reporter: Cihad Guzel
>Assignee: Cihad Guzel
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> We have to update h2 dependency download  link. I try this command:
> {noformat}
> ant make-core-deps{noformat}
> I have this error:
> {noformat}
> download-h2:
>   [get] Getting: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] To: /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Can't get 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar to 
> /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
> BUILD FAILED
> /Users/cguzel/Projects/apache/svn/mcf-trunk/build.xml:1514: Can't get 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar to 
> /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
> {noformat}
> I tried the request in my browser and got the following response:
> [http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar]
> {noformat}
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1634) Occured 501 HTTP Error while downloading 'h2' dependency

2020-01-25 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023606#comment-17023606
 ] 

Karl Wright commented on CONNECTORS-1634:
-

Let me try this here.  We released in December and there was no issue then.


> Occured 501 HTTP Error while downloading 'h2' dependency
> 
>
> Key: CONNECTORS-1634
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1634
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.15
>Reporter: Cihad Guzel
>Assignee: Cihad Guzel
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> We have to update h2 dependency download  link. I try this command:
> {noformat}
> ant make-core-deps{noformat}
> I have this error:
> {noformat}
> download-h2:
>   [get] Getting: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] To: /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Error opening connection java.io.IOException: Server returned 
> HTTP response code: 501 for URL: 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar
>   [get] Can't get 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar to 
> /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
> BUILD FAILED
> /Users/cguzel/Projects/apache/svn/mcf-trunk/build.xml:1514: Can't get 
> http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar to 
> /Users/cguzel/Projects/apache/svn/mcf-trunk/lib/h2-1.3.158.jar
> {noformat}
> I tried the request in my browser and got the following response:
> [http://repo2.maven.org/maven2/com/h2database/h2/1.3.158/h2-1.3.158.jar]
> {noformat}
> 501 HTTPS Required. 
> Use https://repo1.maven.org/maven2/
> More information at https://links.sonatype.com/central/501-https-required
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-25 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023495#comment-17023495
 ] 

Karl Wright commented on CONNECTORS-1629:
-

[~jornfranke], I can add the documentation.  But what I want you to do is 
re-test, since I changed some things around.  Specifically, does it work to 
reference the jaas-config file by using a relative path, e.g. 
"./jaas-client.config"?  I believe it should but needs to be confirmed. 
 That's simple to add to the instructions.  Also, if you include a quick 
description in your own words (with online references as needed) for how to 
edit jaas-client.config to meet your own needs, I can edit it accordingly.  
Please just include as a comment in this ticket.


> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1624) Get ManifoldCF to run under Java 11 or higher

2020-01-25 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023487#comment-17023487
 ] 

Karl Wright commented on CONNECTORS-1624:
-

[~cguzel], this needs to be worked on this release.  I'd like to know what set 
of packages from the Maven Repository will replace what's removed from JDK 11 
vs. JDK 8.  If you have time to research that and provide that information in 
this ticket I'd be very grateful.  Thanks in advance!


> Get ManifoldCF to run under Java 11 or higher
> -
>
> Key: CONNECTORS-1624
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1624
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Framework core
>    Reporter: Karl Wright
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Java 11 doesn't include a number of classes that Java 8 does.  We need to 
> explicitly include jars that provide these classes or ManifoldCF will not 
> function under higher Java revs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1624) Get ManifoldCF to run under Java 11 or higher

2020-01-25 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1624:

Fix Version/s: (was: ManifoldCF next)
   ManifoldCF 2.16

> Get ManifoldCF to run under Java 11 or higher
> -
>
> Key: CONNECTORS-1624
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1624
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Framework core
>    Reporter: Karl Wright
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Java 11 doesn't include a number of classes that Java 8 does.  We need to 
> explicitly include jars that provide these classes or ManifoldCF will not 
> function under higher Java revs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1629:

Component/s: (was: Solr 7.x component)
 Lucene/SOLR connector

> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1629.
-
Fix Version/s: ManifoldCF 2.16
   Resolution: Fixed

Still need documentation improvement on the "how to build and deploy" page

> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1521) Documentum Connector users ManifoldCF's local time in queries constraints against the Documentum server without reference to time zones

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1521:

Fix Version/s: (was: ManifoldCF 2.15)
   ManifoldCF next

> Documentum Connector users ManifoldCF's local time in queries constraints 
> against the Documentum server without reference to time zones
> ---
>
> Key: CONNECTORS-1521
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1521
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
>
> I find that the time/date constraints in queries to the Documentum server are 
> based on the "raw" local time of the ManifoldCF server but appear to take no 
> account of the time zones of the two servers.
> This can lead to recently modified files not being transferred to the output 
> repository when you would naturally expect them to be. I'd like the times to 
> be aligned, perhaps by including time zone in the query. In particular, is 
> there a way to use UTC perhaps?
> Here's an example ...
>  * create a folder in Documentum
>  * set up a job to point at the folder and output to the file system
>  * put two documents into a folder in Documentum
>  * Select them, right click and export as CSV (to show the timestamps):
> {noformat}
> 1.png,48489.0,Portable Network Graphics,8/7/2018 9:04 AM,
> 2.png,28620.0,Portable Network Graphics,8/7/2018 9:04 AM,,{noformat}
> Check the local time on the ManifoldCF server machine. Observe that it's 
> reporting consistent time with the DM server:
> {noformat}
> [james@manifold]$ date
> Tue Aug  7 09:07:25 BST 2018{noformat}
> Start the job and look for the query to Documentum in the manifoldcf.log file 
> (line break added for readability):
> {noformat}
> DEBUG 2018-08-07T08:07:47.297Z (Startup thread) - DCTM: About to execute 
> query= (select for READ distinct i_chronicle_id from dm_document where 
> r_modify_date >= date('01/01/1970 00:00:00','mm/dd/ hh:mi:ss') and
> r_modify_date<=date('08/07/2018 08:07:34','mm/dd/ hh:mi:ss') 
> AND (i_is_deleted=TRUE Or (i_is_deleted=FALSE AND a_full_text=TRUE AND 
> r_content_size>0)) AND ( Folder('/Administrator/james', DESCEND) ))
> ^C{noformat}
> Notice that the latest date asked for is *before* the modification date of 
> the files added to DM. (And is an hour out, see footnote.)
>   
>  See whether anything has been output by the File System connector. It hasn't:
> {noformat}
> [james@manifold]$ ls /bigdisc/source/PDFs/timezones/
> [james@manifold]$
> {noformat}
> Now:
>  * change the timezone on the ManifoldCF server machine
>  * restart the ManifoldCF server and the Documentum processes
>  * reseed the job
> Check the local time on the ManifoldCF server machine; it has changed:
> {noformat}
> [james@manifold]$ date
> Tue Aug  7 10:10:29 CEST 2018{noformat}
> Start the job again and notice that the query has changed by an hour, plus 
> the few minutes it took to change the date etc (and is still an hour out, see 
> footnote):
> {noformat}
> r_modify_date<=date('08/07/2018 09:11:02','mm/dd/ hh:mi:ss') 
> {noformat}
> Observe that the range of dates now covers the timestamps on the DM data, and 
> also that some data has now been transferred by the File System connector:
> {noformat}
> [james@manifold]$ ls 
> /bigdisc/source/PDFs/timezones/http/mfserver\:8080/da/component/
> drl?versionLabel=CURRENT=09018000e515
> drl?versionLabel=CURRENT=09018000e516
> {noformat}
>  
>  
> [Footnote] It appears that something is trying to take account of Daylight 
> Saving Time too.
> If I set the server date to a time outside of DST, the query is aligned with 
> the current time:
> {noformat}
> [i2e@i2ehost manifold]$ date
>  Mon Oct 29 00:01:13 CET 2018
> r_modify_date<=date('10/29/2018 00:01:39','mm/dd/ hh:mi:ss') 
> {noformat}
> But if I set the time inside DST, the time is an hour before:
> {noformat}
> [i2e@i2ehost manifold]$ date
>  Sat Oct 27 00:00:06 CEST 2018
> r_modify_date<=date('10/26/2018 23:00:26','mm/dd/ hh:mi:ss') 
> {noformat}
> This is perhaps a Java issue rather than a logic issue in the connector? See 
> e.g. [https://stackoverflow.com/questions/6392/java-time-zone-is-messed-up]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1624) Get ManifoldCF to run under Java 11 or higher

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1624:

Fix Version/s: (was: ManifoldCF 2.15)
   ManifoldCF next

> Get ManifoldCF to run under Java 11 or higher
> -
>
> Key: CONNECTORS-1624
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1624
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Framework core
>    Reporter: Karl Wright
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Java 11 doesn't include a number of classes that Java 8 does.  We need to 
> explicitly include jars that provide these classes or ManifoldCF will not 
> function under higher Java revs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1508) Add support for French Language

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1508:

Fix Version/s: (was: ManifoldCF 2.15)
   ManifoldCF next

> Add support for French Language
> ---
>
> Key: CONNECTORS-1508
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1508
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: ManifoldCF 2.10
>Reporter: Cedric Ulmer
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF next
>
> Attachments: cedricmanifold_fr.zip
>
>
> Some users may need a French version of the ressource bundle. I attached a 
> preliminary translation that France Labs made some time ago (probably around 
> summer 2016), but that we halted due to lack of time (and priority). It is 
> probably almost complete, but some quality checking needs to be done. Note 
> also that I forgot to check the version when I did the translations, so 
> anyone interested would need to check any modifications that may have 
> occurred between this version and the current MCF version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1622) Upgrade to Tika 1.22

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1622:

Fix Version/s: (was: ManifoldCF 2.15)
   ManifoldCF next

> Upgrade to Tika 1.22
> 
>
> Key: CONNECTORS-1622
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1622
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Tika extractor
>Affects Versions: ManifoldCF 2.13
>Reporter: Cihad Guzel
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Tika has released 1.22. Changes can be found from here: 
> http://tika.apache.org/1.22/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023381#comment-17023381
 ] 

Karl Wright commented on CONNECTORS-1629:
-

r1873121 commits this code and includes stub -D switches wherever they are 
needed.
I've included the sample jaas-config file but please note that the options.env 
files need to be hand-modified to point to the jaas-config file to enable 
Kerberos.  There's a placeholder empty -D that should be completed.  This 
deserves mention in the "how-to-build-and-deploy" page, which I will add as 
soon as we reverify that everything works as expected still when built from 
trunk.


> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1633) Exception tossed: Repeated service interruptions - failure processing document: The process cannot access the file because it is being used by another process.

2020-01-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022959#comment-17022959
 ] 

Karl Wright commented on CONNECTORS-1633:
-

Hi,
The connector retries for a specific period of time on this and then gives up 
and aborts the job.  What kind of behavior would you like to see different?  It 
could choose to skip the file and continue instead, but that I'd worry about 
too.



> Exception tossed: Repeated service interruptions - failure processing 
> document: The process cannot access the file because it is being used by 
> another process.
> ---
>
> Key: CONNECTORS-1633
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1633
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.13
>Reporter: Michael Cizmar
>Assignee: Karl Wright
>Priority: Major
>
> Seeing this error occurring and I'm working to address it.  If it's not a 
> bug, a better message should be generated.
>  
> {code:java}
> crawl job fails with the following error due to document being in use by 
> another user: 
>  WARN 2019-08-25T15:02:27,416 (Worker thread '11') - Service interruption 
> reported for job 1565115290083 connection 'fs_vwoaahvp319': Timeout or other 
> service interruption: The process cannot access the file because it is being 
> used by another process.
> ERROR 2019-08-25T15:02:27,424 (Worker thread '11') - Exception tossed: 
> Repeated service interruptions - failure processing document: The process 
> cannot access the file because it is being used by another process.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: The process cannot access the 
> file because it is being used by another process.
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
> [mcf-pull-agent.jar:?]
> Caused by: jcifs.smb.SmbException: The process cannot access the file because 
> it is being used by another process.
>         at 
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1457) ~[?:?]
>         at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1568) 
> ~[?:?]
>         at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1023) 
> ~[?:?]
>         at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539) ~[?:?]
>         at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) ~[?:?]
>         at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:401) 
> ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:315) ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:295) ~[?:?]
>         at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) ~[?:?]
>         at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1741) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1704) ~[?:?]
>         at jcifs.smb.SmbFile.queryPath(SmbFile.java:770) ~[?:?]
>         at jcifs.smb.SmbFile.exists(SmbFile.java:851) ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> ~[mcf-pull-agent.jar:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1633) Exception tossed: Repeated service interruptions - failure processing document: The process cannot access the file because it is being used by another process.

2020-01-24 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1633:
---

Assignee: Karl Wright

> Exception tossed: Repeated service interruptions - failure processing 
> document: The process cannot access the file because it is being used by 
> another process.
> ---
>
> Key: CONNECTORS-1633
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1633
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector
>Affects Versions: ManifoldCF 2.13
>Reporter: Michael Cizmar
>Assignee: Karl Wright
>Priority: Major
>
> Seeing this error occurring and I'm working to address it.  If it's not a 
> bug, a better message should be generated.
>  
> {code:java}
> crawl job fails with the following error due to document being in use by 
> another user: 
>  WARN 2019-08-25T15:02:27,416 (Worker thread '11') - Service interruption 
> reported for job 1565115290083 connection 'fs_vwoaahvp319': Timeout or other 
> service interruption: The process cannot access the file because it is being 
> used by another process.
> ERROR 2019-08-25T15:02:27,424 (Worker thread '11') - Exception tossed: 
> Repeated service interruptions - failure processing document: The process 
> cannot access the file because it is being used by another process.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
> interruptions - failure processing document: The process cannot access the 
> file because it is being used by another process.
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
> [mcf-pull-agent.jar:?]
> Caused by: jcifs.smb.SmbException: The process cannot access the file because 
> it is being used by another process.
>         at 
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1457) ~[?:?]
>         at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1568) 
> ~[?:?]
>         at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1023) 
> ~[?:?]
>         at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539) ~[?:?]
>         at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) ~[?:?]
>         at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:401) 
> ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:315) ~[?:?]
>         at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:295) ~[?:?]
>         at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) ~[?:?]
>         at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1741) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710) ~[?:?]
>         at jcifs.smb.SmbFile.withOpen(SmbFile.java:1704) ~[?:?]
>         at jcifs.smb.SmbFile.queryPath(SmbFile.java:770) ~[?:?]
>         at jcifs.smb.SmbFile.exists(SmbFile.java:851) ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> ~[mcf-pull-agent.jar:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-23 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022122#comment-17022122
 ] 

Karl Wright commented on CONNECTORS-1629:
-

Hi [[~jornfranke], can you include the URL of the pull request here?  Thanks!


> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-23 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1629:
---

Assignee: Karl Wright

> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-22 Thread Karl Wright
The whole web services java + cxf architecture is pretty mysterious.  The
only way I've made progress is by finding code snippets in stackoverflow;
the documentation is not adequate.  BUT there are configuration files that
determine how the WSDL parser resolves references.  I don't know how we
would force that configuration to be in effect but something like that
would need to be done.  I'm just surprised that you're having this problem
when two other installations didn't.  There must be a difference somewhere.

Karl


On Wed, Jan 22, 2020 at 5:11 AM Jörn Franke  wrote:

> Sorry I did not have much time, my next action plan is to try to modify
> the catalogue xml to fetch it directly from the https. For some reasons it
> can fetch the WSDL (after my fix), but not the included xsds despite that
> in the error message it has the correct url of them.
> Are you aware of any configuration that tries to force file based access
> of those? In the Code i did not find anything suspicious.
>
> Am 22.01.2020 um 10:28 schrieb Karl Wright :
>
> 
> Has there been any news?
> I'd love to get this tied up so that you're able to proceed.
> Karl
>
> On Thu, Jan 16, 2020 at 12:08 PM Jörn Franke  wrote:
>
>> Ok I understand. I will try and let you know. Thanks again very much for
>> your fast and detailed answer. Really appreciated. I hope I can give back
>> with the solution to fetch WSDLs from https and maybe a solution to this
>> problem (maybe other will face this as well).
>>
>> About the connector: the WSDL is successfully fetched via https (not file
>> - no clue why) - after the modification I made. The only problem I see now
>> is that the xsd to which the WSDL is referring are not fetched. The bizarre
>> thing is that the https url that it mention for the xsd is absolutely
>> correct. So I assume it does not understand an http url, maybe that is
>> related to configuration.
>>
>> Am 16.01.2020 um 14:53 schrieb Karl Wright :
>>
>> 
>> The WSDLS are bundled with the jar.  We intended this to be the ONLY way
>> the wsdls were accessed, and made lots of changes to the wsdls accordingly,
>> so that they referenced other wsdls via the "file system".  The wsdls are
>> the fixed up ones that are used to build the java stubs locally, plus a
>> config file that's supposed to tell CXF how to resolve referenced wsdls.
>> That config file may or may not be correct, because we never were able to
>> get CXF to use the local resource wsdls during actual connection.
>>
>> Except now they seem to be both fetched via https AND locally sourced.  I
>> have no idea how that can be.  I had assumed it was done one way or the
>> other but not both.
>>
>> Perhaps the problem is that the configuration file is being read but the
>> resource wsdls are not being found?  Removing the meta-inf from the jar
>> would then force everything to go through https.  Ideally I'd love it if
>> that wasn't needed and we could get the resource fetch working everywhere.
>>
>>
>> Karl
>>
>>
>> On Thu, Jan 16, 2020 at 8:20 AM Jörn Franke  wrote:
>>
>>> Well i am not sure how they solved it - I will share a tested solution
>>> in Jira and everyone can check. Maybe their wsdl is accessible through http?
>>>
>>> What works is doing call through https,  but thee fetching of WSDL did
>>> not - as this is through another mechanism.
>>>
>>> I don’t think that the open text is different, the WSDL look very
>>> similar to the repository.
>>>
>>> The strange thing is that for this error message it tries to access the
>>> xsd through a https url (which is perfectly accessible for the server).
>>> Could it be that the connector restrict itself somehow to local file
>>> system only or similar?
>>> Have you faced this issue before?
>>>
>>>
>>>
>>> Am 16.01.2020 um 12:56 schrieb Karl Wright :
>>>
>>> 
>>> I should say that we have (AFAICT) at least two independent
>>> installations of the csws connector working in the field, at least one of
>>> them using secure connections.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jan 16, 2020 at 6:54 AM Karl Wright  wrote:
>>>
>>>> We solved the WSDL fetching through HTTPS, or thought we had, by
>>>> restructuring the code according to a number of articles we found.  This
>>>> was supposedly tested and worked in one installation.  Nobody has ever
>>>> reported issues with the wsdls being fetched however; I worry that you may
>>>> have a 

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-22 Thread Karl Wright
Has there been any news?
I'd love to get this tied up so that you're able to proceed.
Karl

On Thu, Jan 16, 2020 at 12:08 PM Jörn Franke  wrote:

> Ok I understand. I will try and let you know. Thanks again very much for
> your fast and detailed answer. Really appreciated. I hope I can give back
> with the solution to fetch WSDLs from https and maybe a solution to this
> problem (maybe other will face this as well).
>
> About the connector: the WSDL is successfully fetched via https (not file
> - no clue why) - after the modification I made. The only problem I see now
> is that the xsd to which the WSDL is referring are not fetched. The bizarre
> thing is that the https url that it mention for the xsd is absolutely
> correct. So I assume it does not understand an http url, maybe that is
> related to configuration.
>
> Am 16.01.2020 um 14:53 schrieb Karl Wright :
>
> 
> The WSDLS are bundled with the jar.  We intended this to be the ONLY way
> the wsdls were accessed, and made lots of changes to the wsdls accordingly,
> so that they referenced other wsdls via the "file system".  The wsdls are
> the fixed up ones that are used to build the java stubs locally, plus a
> config file that's supposed to tell CXF how to resolve referenced wsdls.
> That config file may or may not be correct, because we never were able to
> get CXF to use the local resource wsdls during actual connection.
>
> Except now they seem to be both fetched via https AND locally sourced.  I
> have no idea how that can be.  I had assumed it was done one way or the
> other but not both.
>
> Perhaps the problem is that the configuration file is being read but the
> resource wsdls are not being found?  Removing the meta-inf from the jar
> would then force everything to go through https.  Ideally I'd love it if
> that wasn't needed and we could get the resource fetch working everywhere.
>
>
> Karl
>
>
> On Thu, Jan 16, 2020 at 8:20 AM Jörn Franke  wrote:
>
>> Well i am not sure how they solved it - I will share a tested solution in
>> Jira and everyone can check. Maybe their wsdl is accessible through http?
>>
>> What works is doing call through https,  but thee fetching of WSDL did
>> not - as this is through another mechanism.
>>
>> I don’t think that the open text is different, the WSDL look very similar
>> to the repository.
>>
>> The strange thing is that for this error message it tries to access the
>> xsd through a https url (which is perfectly accessible for the server).
>> Could it be that the connector restrict itself somehow to local file
>> system only or similar?
>> Have you faced this issue before?
>>
>>
>>
>> Am 16.01.2020 um 12:56 schrieb Karl Wright :
>>
>> 
>> I should say that we have (AFAICT) at least two independent installations
>> of the csws connector working in the field, at least one of them using
>> secure connections.
>>
>> Karl
>>
>>
>> On Thu, Jan 16, 2020 at 6:54 AM Karl Wright  wrote:
>>
>>> We solved the WSDL fetching through HTTPS, or thought we had, by
>>> restructuring the code according to a number of articles we found.  This
>>> was supposedly tested and worked in one installation.  Nobody has ever
>>> reported issues with the wsdls being fetched however; I worry that you may
>>> have a different version of OpenText that is incompatible with the one we
>>> developed against.  That's the problem with this kind of architecture;
>>> unless the wsdls are included in the jar there can be issues.  We tried to
>>> do that too but were unable to get it to work.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jan 16, 2020 at 5:34 AM Jörn Franke 
>>> wrote:
>>>
>>>> Ok i fixed the source to fetch WSDL from https (it is not perfect yet
>>>> as it does not use the truststore yet but this I can fix) - I will share
>>>> later in a Jira.
>>>> It is however now unable to locate the imported document
>>>> /Authentication?xsd=2 relative to Authenticaton?wsdl#types1
>>>>
>>>> I will look into this, but if someone has come cross it then please let
>>>> me know.
>>>>
>>>> Am 16.01.2020 um 10:22 schrieb Jörn Franke :
>>>>
>>>> 
>>>> Coming back to the original topic. I believe SSL was never fully solved
>>>> from what i read in the corresponding issue. Apparently, the fetching of
>>>> the WSDL itself through https was not possible. Do you remember still some
>>>> insights beyond what is written in the issue ?
>>>

Re: Congratulations to the new Lucene/Solr PMC Chair, Anshum Gupta!

2020-01-17 Thread Karl Wright
Congratulations!!
Karl


On Fri, Jan 17, 2020 at 6:37 AM Namgyu Kim  wrote:

> Congratulations Anshum! :D
>
> On Fri, Jan 17, 2020 at 7:32 PM Ignacio Vera  wrote:
>
>> Congrats Anshum!
>>
>> On Fri, Jan 17, 2020 at 3:17 AM Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> Congratulations Anshum!
>>>
>>> On Thu, Jan 16, 2020 at 2:45 AM Cassandra Targett 
>>> wrote:
>>>
 Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
 President position.

 This year we have nominated and elected Anshum Gupta as the Chair, a
 decision that the board approved in its January 2020 meeting.

 Congratulations, Anshum!

 Cassandra


>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
The WSDLS are bundled with the jar.  We intended this to be the ONLY way
the wsdls were accessed, and made lots of changes to the wsdls accordingly,
so that they referenced other wsdls via the "file system".  The wsdls are
the fixed up ones that are used to build the java stubs locally, plus a
config file that's supposed to tell CXF how to resolve referenced wsdls.
That config file may or may not be correct, because we never were able to
get CXF to use the local resource wsdls during actual connection.

Except now they seem to be both fetched via https AND locally sourced.  I
have no idea how that can be.  I had assumed it was done one way or the
other but not both.

Perhaps the problem is that the configuration file is being read but the
resource wsdls are not being found?  Removing the meta-inf from the jar
would then force everything to go through https.  Ideally I'd love it if
that wasn't needed and we could get the resource fetch working everywhere.


Karl


On Thu, Jan 16, 2020 at 8:20 AM Jörn Franke  wrote:

> Well i am not sure how they solved it - I will share a tested solution in
> Jira and everyone can check. Maybe their wsdl is accessible through http?
>
> What works is doing call through https,  but thee fetching of WSDL did not
> - as this is through another mechanism.
>
> I don’t think that the open text is different, the WSDL look very similar
> to the repository.
>
> The strange thing is that for this error message it tries to access the
> xsd through a https url (which is perfectly accessible for the server).
> Could it be that the connector restrict itself somehow to local file
> system only or similar?
> Have you faced this issue before?
>
>
>
> Am 16.01.2020 um 12:56 schrieb Karl Wright :
>
> 
> I should say that we have (AFAICT) at least two independent installations
> of the csws connector working in the field, at least one of them using
> secure connections.
>
> Karl
>
>
> On Thu, Jan 16, 2020 at 6:54 AM Karl Wright  wrote:
>
>> We solved the WSDL fetching through HTTPS, or thought we had, by
>> restructuring the code according to a number of articles we found.  This
>> was supposedly tested and worked in one installation.  Nobody has ever
>> reported issues with the wsdls being fetched however; I worry that you may
>> have a different version of OpenText that is incompatible with the one we
>> developed against.  That's the problem with this kind of architecture;
>> unless the wsdls are included in the jar there can be issues.  We tried to
>> do that too but were unable to get it to work.
>>
>> Karl
>>
>>
>> On Thu, Jan 16, 2020 at 5:34 AM Jörn Franke  wrote:
>>
>>> Ok i fixed the source to fetch WSDL from https (it is not perfect yet as
>>> it does not use the truststore yet but this I can fix) - I will share later
>>> in a Jira.
>>> It is however now unable to locate the imported document
>>> /Authentication?xsd=2 relative to Authenticaton?wsdl#types1
>>>
>>> I will look into this, but if someone has come cross it then please let
>>> me know.
>>>
>>> Am 16.01.2020 um 10:22 schrieb Jörn Franke :
>>>
>>> 
>>> Coming back to the original topic. I believe SSL was never fully solved
>>> from what i read in the corresponding issue. Apparently, the fetching of
>>> the WSDL itself through https was not possible. Do you remember still some
>>> insights beyond what is written in the issue ?
>>>
>>> Am 16.01.2020 um 00:37 schrieb Karl Wright :
>>>
>>> 
>>> Let me think about that option.
>>>
>>> Karl
>>>
>>>
>>> On Wed, Jan 15, 2020 at 5:38 PM Jörn Franke 
>>> wrote:
>>>
>>>> We could make it configurable, e.g. in properties.xml. Here people
>>>> could set it to SSL, TLS, TLSv1.2 (to restrict it to TLS1.2 => some
>>>> companies may want that!). Is this a viable option? That would be also
>>>> future proof. We can leave it by default to SSL, but we should put in the
>>>> example config files TLS by default (so new starters do not get even the
>>>> idea to use an outdated protocol) AND put a comment with recommendation to
>>>> use/enforce always newest protocols for security reasons. Of course, the
>>>> choice is then with the people using the software.
>>>> Could that be something sensible from your point of view?
>>>>
>>>> On Wed, Jan 15, 2020 at 11:14 PM Karl Wright 
>>>> wrote:
>>>>
>>>>> It's rather immaterial what browsers do here.  What's important is
>>>>> wha

[jira] [Resolved] (CONNECTORS-1631) Sharepoint connction problem

2020-01-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1631.
-
Fix Version/s: ManifoldCF 2.15
   Resolution: Fixed

> Sharepoint connction problem
> 
>
> Key: CONNECTORS-1631
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1631
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Zoltan Farago
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.15
>
> Attachments: Manifold connection.png
>
>
> Hello,
> We are trying to connct to a Sharepoint 2016 site wich has default 
> installation. The URL is 
> [http://precogwin02/sites/UKAEAtestSP2016/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx]
>  and from a browser it is fully operational. The site is installed on our 
> local network, no firewall issues could be. 
> When we try to connect from the Manifold CF we get this error message: "The 
> site at 
> [http://manifoldsharepoint/sites/UKAEAtestSP2016|http://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents]
>  did not exist or was external; skipping"
> This Manifold installation is able to connect to a Windows share on the same 
> server, so we think no user/pass Active Directory, etc issues could be here. 
> We checked forums, documentations but found no solution. 
>  
> Is there any special setting needed in Manifold, Sharepoint, et.? 
>  
> Thank you in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1631) Sharepoint connction problem

2020-01-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016840#comment-17016840
 ] 

Karl Wright commented on CONNECTORS-1631:
-

Don't know about SharePoint Online.  For SharePoint 2019 a plugin needs to be 
released that's properly linked against the SharePoint dll.  If you are 
intending to use 2019 please let me know if you can supply the DLL so that we 
can set up and release the plugin.


> Sharepoint connction problem
> 
>
> Key: CONNECTORS-1631
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1631
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Zoltan Farago
>    Assignee: Karl Wright
>Priority: Major
> Attachments: Manifold connection.png
>
>
> Hello,
> We are trying to connct to a Sharepoint 2016 site wich has default 
> installation. The URL is 
> [http://precogwin02/sites/UKAEAtestSP2016/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx]
>  and from a browser it is fully operational. The site is installed on our 
> local network, no firewall issues could be. 
> When we try to connect from the Manifold CF we get this error message: "The 
> site at 
> [http://manifoldsharepoint/sites/UKAEAtestSP2016|http://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents]
>  did not exist or was external; skipping"
> This Manifold installation is able to connect to a Windows share on the same 
> server, so we think no user/pass Active Directory, etc issues could be here. 
> We checked forums, documentations but found no solution. 
>  
> Is there any special setting needed in Manifold, Sharepoint, et.? 
>  
> Thank you in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [jira] [Commented] (CONNECTORS-1631) Sharepoint connction problem

2020-01-16 Thread Karl Wright
Don't know about SharePoint Online.  For SharePoint 2019 a plugin needs to
be released that's properly linked against the SharePoint dll.  If you are
intending to use 2019 please let me know if you can supply the DLL so that
we can set up and release the plugin.

Karl


On Thu, Jan 16, 2020 at 6:54 AM Zoltan Farago (Jira) 
wrote:

>
> [
> https://issues.apache.org/jira/browse/CONNECTORS-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016836#comment-17016836
> ]
>
> Zoltan Farago commented on CONNECTORS-1631:
> ---
>
> [~kwri...@metacarta.com] thank you! Our developer intelled, and he is
> abla to connect now.
>
> One more question if you don't mind. Will this plugin work with Sharepoint
> 2019 and Sharepoint Online as well?
>
> If yes SP Online would be tricky to install, is there any detailed
> step-by-step guide?
>
> > Sharepoint connction problem
> > 
> >
> > Key: CONNECTORS-1631
> > URL:
> https://issues.apache.org/jira/browse/CONNECTORS-1631
> > Project: ManifoldCF
> >  Issue Type: Task
> >Reporter: Zoltan Farago
> >Assignee: Karl Wright
> >Priority: Major
> > Attachments: Manifold connection.png
> >
> >
> > Hello,
> > We are trying to connct to a Sharepoint 2016 site wich has default
> installation. The URL is [
> http://precogwin02/sites/UKAEAtestSP2016/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx]
>  and
> from a browser it is fully operational. The site is installed on our local
> network, no firewall issues could be.
> > When we try to connect from the Manifold CF we get this error message:
> "The site at [
> http://manifoldsharepoint/sites/UKAEAtestSP2016|http://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents
> <http://manifoldsharepoint/sites/UKAEAtestSP2016%7Chttp://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents>]
> did not exist or was external; skipping"
> > This Manifold installation is able to connect to a Windows share on the
> same server, so we think no user/pass Active Directory, etc issues could be
> here.
> > We checked forums, documentations but found no solution.
> >
> > Is there any special setting needed in Manifold, Sharepoint, et.?
> >
> > Thank you in advance!
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
I should say that we have (AFAICT) at least two independent installations
of the csws connector working in the field, at least one of them using
secure connections.

Karl


On Thu, Jan 16, 2020 at 6:54 AM Karl Wright  wrote:

> We solved the WSDL fetching through HTTPS, or thought we had, by
> restructuring the code according to a number of articles we found.  This
> was supposedly tested and worked in one installation.  Nobody has ever
> reported issues with the wsdls being fetched however; I worry that you may
> have a different version of OpenText that is incompatible with the one we
> developed against.  That's the problem with this kind of architecture;
> unless the wsdls are included in the jar there can be issues.  We tried to
> do that too but were unable to get it to work.
>
> Karl
>
>
> On Thu, Jan 16, 2020 at 5:34 AM Jörn Franke  wrote:
>
>> Ok i fixed the source to fetch WSDL from https (it is not perfect yet as
>> it does not use the truststore yet but this I can fix) - I will share later
>> in a Jira.
>> It is however now unable to locate the imported document
>> /Authentication?xsd=2 relative to Authenticaton?wsdl#types1
>>
>> I will look into this, but if someone has come cross it then please let
>> me know.
>>
>> Am 16.01.2020 um 10:22 schrieb Jörn Franke :
>>
>> 
>> Coming back to the original topic. I believe SSL was never fully solved
>> from what i read in the corresponding issue. Apparently, the fetching of
>> the WSDL itself through https was not possible. Do you remember still some
>> insights beyond what is written in the issue ?
>>
>> Am 16.01.2020 um 00:37 schrieb Karl Wright :
>>
>> 
>> Let me think about that option.
>>
>> Karl
>>
>>
>> On Wed, Jan 15, 2020 at 5:38 PM Jörn Franke  wrote:
>>
>>> We could make it configurable, e.g. in properties.xml. Here people could
>>> set it to SSL, TLS, TLSv1.2 (to restrict it to TLS1.2 => some companies may
>>> want that!). Is this a viable option? That would be also future proof. We
>>> can leave it by default to SSL, but we should put in the example config
>>> files TLS by default (so new starters do not get even the idea to use an
>>> outdated protocol) AND put a comment with recommendation to use/enforce
>>> always newest protocols for security reasons. Of course, the choice is then
>>> with the people using the software.
>>> Could that be something sensible from your point of view?
>>>
>>> On Wed, Jan 15, 2020 at 11:14 PM Karl Wright  wrote:
>>>
>>>> It's rather immaterial what browsers do here.  What's important is
>>>> what  *existing servers* support, since that is what we're connecting with.
>>>>
>>>> I tend to agree that *most* people have probably upgraded to web
>>>> servers that support TLS.  But we can't guarantee it, nor can we assume
>>>> that people have upgraded to the most modern version of TLS exclusively.
>>>> In fact I think we can assume they have *not*.  When the SSL issues were
>>>> discovered a couple of years back, the standard recommendation was simply
>>>> to *disable* SSLv1 and SSLv2, not to upgrade to Java 11 or some such.  We
>>>> still support (and have people using!!) early forms of NTLM (v1 to be
>>>> specific), for instance.  We're not going to be able to wag the dog here.
>>>> Breaking changes of this kind usually mean we go to a whole new major
>>>> version of MCF.
>>>>
>>>> However, if you can show that SSLContext.getSSLFactory("TLS") produces
>>>> a SSLSocketFactory that works with all versions of TLS and SSL that do not
>>>> have known security holes, I would support changing over to that.  If it
>>>> turns out we need much more specificity about the kind of SSLSocketFactory
>>>> we produce, then we need a better solution anyhow for handling multiple
>>>> protocols in one socket factory.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Wed, Jan 15, 2020 at 5:17 AM Jörn Franke 
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> No it does not. I can look into that further, but Current browsers
>>>>> stop supporting anything below TLSv1.2 in March 2020.
>>>>> Then TLS exists since more than ten years. I expect any server running
>>>>> nowadays will always have tls support.
>>>>> SSL itself is not supported since some time now. From a security
>>>>> perspe

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-16 Thread Karl Wright
We solved the WSDL fetching through HTTPS, or thought we had, by
restructuring the code according to a number of articles we found.  This
was supposedly tested and worked in one installation.  Nobody has ever
reported issues with the wsdls being fetched however; I worry that you may
have a different version of OpenText that is incompatible with the one we
developed against.  That's the problem with this kind of architecture;
unless the wsdls are included in the jar there can be issues.  We tried to
do that too but were unable to get it to work.

Karl


On Thu, Jan 16, 2020 at 5:34 AM Jörn Franke  wrote:

> Ok i fixed the source to fetch WSDL from https (it is not perfect yet as
> it does not use the truststore yet but this I can fix) - I will share later
> in a Jira.
> It is however now unable to locate the imported document
> /Authentication?xsd=2 relative to Authenticaton?wsdl#types1
>
> I will look into this, but if someone has come cross it then please let me
> know.
>
> Am 16.01.2020 um 10:22 schrieb Jörn Franke :
>
> 
> Coming back to the original topic. I believe SSL was never fully solved
> from what i read in the corresponding issue. Apparently, the fetching of
> the WSDL itself through https was not possible. Do you remember still some
> insights beyond what is written in the issue ?
>
> Am 16.01.2020 um 00:37 schrieb Karl Wright :
>
> 
> Let me think about that option.
>
> Karl
>
>
> On Wed, Jan 15, 2020 at 5:38 PM Jörn Franke  wrote:
>
>> We could make it configurable, e.g. in properties.xml. Here people could
>> set it to SSL, TLS, TLSv1.2 (to restrict it to TLS1.2 => some companies may
>> want that!). Is this a viable option? That would be also future proof. We
>> can leave it by default to SSL, but we should put in the example config
>> files TLS by default (so new starters do not get even the idea to use an
>> outdated protocol) AND put a comment with recommendation to use/enforce
>> always newest protocols for security reasons. Of course, the choice is then
>> with the people using the software.
>> Could that be something sensible from your point of view?
>>
>> On Wed, Jan 15, 2020 at 11:14 PM Karl Wright  wrote:
>>
>>> It's rather immaterial what browsers do here.  What's important is what
>>> *existing servers* support, since that is what we're connecting with.
>>>
>>> I tend to agree that *most* people have probably upgraded to web servers
>>> that support TLS.  But we can't guarantee it, nor can we assume that people
>>> have upgraded to the most modern version of TLS exclusively.  In fact I
>>> think we can assume they have *not*.  When the SSL issues were discovered a
>>> couple of years back, the standard recommendation was simply to *disable*
>>> SSLv1 and SSLv2, not to upgrade to Java 11 or some such.  We still support
>>> (and have people using!!) early forms of NTLM (v1 to be specific), for
>>> instance.  We're not going to be able to wag the dog here.  Breaking
>>> changes of this kind usually mean we go to a whole new major version of MCF.
>>>
>>> However, if you can show that SSLContext.getSSLFactory("TLS") produces a
>>> SSLSocketFactory that works with all versions of TLS and SSL that do not
>>> have known security holes, I would support changing over to that.  If it
>>> turns out we need much more specificity about the kind of SSLSocketFactory
>>> we produce, then we need a better solution anyhow for handling multiple
>>> protocols in one socket factory.
>>>
>>> Karl
>>>
>>>
>>> On Wed, Jan 15, 2020 at 5:17 AM Jörn Franke 
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> No it does not. I can look into that further, but Current browsers stop
>>>> supporting anything below TLSv1.2 in March 2020.
>>>> Then TLS exists since more than ten years. I expect any server running
>>>> nowadays will always have tls support.
>>>> SSL itself is not supported since some time now. From a security
>>>> perspective it should even break servers that run only SSL as they are
>>>> inherently insecure and also clients that only support SSL are adding to
>>>> this.
>>>> However if you have an idea how this should be made configurable then I
>>>> can look into this.
>>>>
>>>> Best regards
>>>>
>>>> Am 15.01.2020 um 10:52 schrieb Karl Wright :
>>>>
>>>> 
>>>> Hi,
>>>>
>>>> Mcf currently requires jdk8.  Jdk11 is non trivial to support because
>>>> o

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
Let me think about that option.

Karl


On Wed, Jan 15, 2020 at 5:38 PM Jörn Franke  wrote:

> We could make it configurable, e.g. in properties.xml. Here people could
> set it to SSL, TLS, TLSv1.2 (to restrict it to TLS1.2 => some companies may
> want that!). Is this a viable option? That would be also future proof. We
> can leave it by default to SSL, but we should put in the example config
> files TLS by default (so new starters do not get even the idea to use an
> outdated protocol) AND put a comment with recommendation to use/enforce
> always newest protocols for security reasons. Of course, the choice is then
> with the people using the software.
> Could that be something sensible from your point of view?
>
> On Wed, Jan 15, 2020 at 11:14 PM Karl Wright  wrote:
>
>> It's rather immaterial what browsers do here.  What's important is what
>> *existing servers* support, since that is what we're connecting with.
>>
>> I tend to agree that *most* people have probably upgraded to web servers
>> that support TLS.  But we can't guarantee it, nor can we assume that people
>> have upgraded to the most modern version of TLS exclusively.  In fact I
>> think we can assume they have *not*.  When the SSL issues were discovered a
>> couple of years back, the standard recommendation was simply to *disable*
>> SSLv1 and SSLv2, not to upgrade to Java 11 or some such.  We still support
>> (and have people using!!) early forms of NTLM (v1 to be specific), for
>> instance.  We're not going to be able to wag the dog here.  Breaking
>> changes of this kind usually mean we go to a whole new major version of MCF.
>>
>> However, if you can show that SSLContext.getSSLFactory("TLS") produces a
>> SSLSocketFactory that works with all versions of TLS and SSL that do not
>> have known security holes, I would support changing over to that.  If it
>> turns out we need much more specificity about the kind of SSLSocketFactory
>> we produce, then we need a better solution anyhow for handling multiple
>> protocols in one socket factory.
>>
>> Karl
>>
>>
>> On Wed, Jan 15, 2020 at 5:17 AM Jörn Franke  wrote:
>>
>>> Hi Karl,
>>>
>>> No it does not. I can look into that further, but Current browsers stop
>>> supporting anything below TLSv1.2 in March 2020.
>>> Then TLS exists since more than ten years. I expect any server running
>>> nowadays will always have tls support.
>>> SSL itself is not supported since some time now. From a security
>>> perspective it should even break servers that run only SSL as they are
>>> inherently insecure and also clients that only support SSL are adding to
>>> this.
>>> However if you have an idea how this should be made configurable then I
>>> can look into this.
>>>
>>> Best regards
>>>
>>> Am 15.01.2020 um 10:52 schrieb Karl Wright :
>>>
>>> 
>>> Hi,
>>>
>>> Mcf currently requires jdk8.  Jdk11 is non trivial to support because of
>>> the removal of many jdk classes connectors need.  It will be ported at some
>>> point but not lightly.
>>>
>>> Similarly, disabling SSL would certainly break many installations upon
>>> upgrade  and we do not do that lightly.
>>>
>>> The core methods that mcf supplies its connectors should therefore be
>>> updated to support but not mandate tls.  The protocol specification one
>>> gives to sslcontext is not a detailed one but rather a major version.  What
>>> I don't know is whether"tlsv1" also allows for older protocols etc.
>>>
>>> Karl
>>>
>>> On Wed, Jan 15, 2020, 1:19 AM Jörn Franke  wrote:
>>>
>>>> Yes I am doing that but I will need to rebuild.
>>>> I don’t recommend TLSv1 - this is already outphased and will lock out
>>>> TLSv1.2. I try TLS only as it includes all TLS protocols (depends on JDK).
>>>>
>>>> SSL will not be supported by this (however as I said there are other
>>>> parts of the code where there is a getInstance(TLS). And some caveats: On
>>>> JDK6+7 TLS only means TLSv1 (and newer TLS Protocols are deactivated) on
>>>> JDK8 it means also that newer TLS protocols are enabled.
>>>> To be honest in my opinion - a SSL only one is a significant security
>>>> hole and given how old TLS support is JDK i would be surprised if there is
>>>> someone using such a server (most Organisations should switch to TLSv1.2 in
>>>> any case as all protocols below have been broken).
>>>> While 

[jira] [Created] (CONNECTORS-1632) Deprecate SSL and use TLS socket factories everywhere instead

2020-01-15 Thread Karl Wright (Jira)
Karl Wright created CONNECTORS-1632:
---

 Summary: Deprecate SSL and use TLS socket factories everywhere 
instead
 Key: CONNECTORS-1632
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1632
 Project: ManifoldCF
  Issue Type: Task
  Components: Framework core, Lucene/SOLR connector
Reporter: Karl Wright
Assignee: Karl Wright


Servers that serve only TLS apparently no longer work with ManifoldCF's various 
connectors.  Changing the socket factory so that it supports the more modern 
protocols seems indicated.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
It's rather immaterial what browsers do here.  What's important is what
*existing servers* support, since that is what we're connecting with.

I tend to agree that *most* people have probably upgraded to web servers
that support TLS.  But we can't guarantee it, nor can we assume that people
have upgraded to the most modern version of TLS exclusively.  In fact I
think we can assume they have *not*.  When the SSL issues were discovered a
couple of years back, the standard recommendation was simply to *disable*
SSLv1 and SSLv2, not to upgrade to Java 11 or some such.  We still support
(and have people using!!) early forms of NTLM (v1 to be specific), for
instance.  We're not going to be able to wag the dog here.  Breaking
changes of this kind usually mean we go to a whole new major version of MCF.

However, if you can show that SSLContext.getSSLFactory("TLS") produces a
SSLSocketFactory that works with all versions of TLS and SSL that do not
have known security holes, I would support changing over to that.  If it
turns out we need much more specificity about the kind of SSLSocketFactory
we produce, then we need a better solution anyhow for handling multiple
protocols in one socket factory.

Karl


On Wed, Jan 15, 2020 at 5:17 AM Jörn Franke  wrote:

> Hi Karl,
>
> No it does not. I can look into that further, but Current browsers stop
> supporting anything below TLSv1.2 in March 2020.
> Then TLS exists since more than ten years. I expect any server running
> nowadays will always have tls support.
> SSL itself is not supported since some time now. From a security
> perspective it should even break servers that run only SSL as they are
> inherently insecure and also clients that only support SSL are adding to
> this.
> However if you have an idea how this should be made configurable then I
> can look into this.
>
> Best regards
>
> Am 15.01.2020 um 10:52 schrieb Karl Wright :
>
> 
> Hi,
>
> Mcf currently requires jdk8.  Jdk11 is non trivial to support because of
> the removal of many jdk classes connectors need.  It will be ported at some
> point but not lightly.
>
> Similarly, disabling SSL would certainly break many installations upon
> upgrade  and we do not do that lightly.
>
> The core methods that mcf supplies its connectors should therefore be
> updated to support but not mandate tls.  The protocol specification one
> gives to sslcontext is not a detailed one but rather a major version.  What
> I don't know is whether"tlsv1" also allows for older protocols etc.
>
> Karl
>
> On Wed, Jan 15, 2020, 1:19 AM Jörn Franke  wrote:
>
>> Yes I am doing that but I will need to rebuild.
>> I don’t recommend TLSv1 - this is already outphased and will lock out
>> TLSv1.2. I try TLS only as it includes all TLS protocols (depends on JDK).
>>
>> SSL will not be supported by this (however as I said there are other
>> parts of the code where there is a getInstance(TLS). And some caveats: On
>> JDK6+7 TLS only means TLSv1 (and newer TLS Protocols are deactivated) on
>> JDK8 it means also that newer TLS protocols are enabled.
>> To be honest in my opinion - a SSL only one is a significant security
>> hole and given how old TLS support is JDK i would be surprised if there is
>> someone using such a server (most Organisations should switch to TLSv1.2 in
>> any case as all protocols below have been broken).
>> While it works for all JDKs - probably JDK8 should be recommended as it
>> seems to have all TLS protocols activated when using „TLS“. Older JDKs seem
>> to deactivate TLSv1.1 and TLSv1.2 when using TLS. I will write more about
>> this in the JIRA, once I verified that this solves the problem.
>> Then TLSv1.3 is JDK11 only - I will investigate what that implies.
>> Does ManifoldCf supports JDK11?
>>
>> Am 15.01.2020 um 00:08 schrieb Karl Wright :
>>
>> 
>> I think you can just change the code to read as follows when it creates
>> the SSLContext:
>>
>> SSLContext ctx = SSLContext.getInstance("TLSv1");
>> I don't know if TLS will downgrade to SSL if that's all that's available.
>>
>>
>> Karl
>>
>>
>>
>> On Tue, Jan 14, 2020 at 6:02 PM Jörn Franke  wrote:
>>
>>> Yes it you do not change this setting as what I suspect happens here.
>>> See my previous mail for details.
>>>
>>> Am 14.01.2020 um 23:51 schrieb Karl Wright :
>>>
>>> 
>>> It looks looks TLS is actually enabled in the SSLSocketFactory framework
>>> based on how you create the SSLSocketContext.  See:
>>>
>>> https://docs.oracle.com/cd/E19698-01/816-7609/security-83/index.html
>>>
>>> Karl
>>>
>>>

[jira] [Commented] (CONNECTORS-1631) Sharepoint connction problem

2020-01-15 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016350#comment-17016350
 ] 

Karl Wright commented on CONNECTORS-1631:
-

So did you install the MCF plugin for Sharepoint 2016 on the SharePoint server? 
 If not, remember that this is mandatory.



> Sharepoint connction problem
> 
>
> Key: CONNECTORS-1631
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1631
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Zoltan Farago
>    Assignee: Karl Wright
>Priority: Major
> Attachments: Manifold connection.png
>
>
> Hello,
> We are trying to connct to a Sharepoint 2016 site wich has default 
> installation. The URL is 
> [http://precogwin02/sites/UKAEAtestSP2016/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx]
>  and from a browser it is fully operational. The site is installed on our 
> local network, no firewall issues could be. 
> When we try to connect from the Manifold CF we get this error message: "The 
> site at 
> [http://manifoldsharepoint/sites/UKAEAtestSP2016|http://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents]
>  did not exist or was external; skipping"
> This Manifold installation is able to connect to a Windows share on the same 
> server, so we think no user/pass Active Directory, etc issues could be here. 
> We checked forums, documentations but found no solution. 
>  
> Is there any special setting needed in Manifold, Sharepoint, et.? 
>  
> Thank you in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1631) Sharepoint connction problem

2020-01-15 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1631:
---

Assignee: Karl Wright

> Sharepoint connction problem
> 
>
> Key: CONNECTORS-1631
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1631
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Zoltan Farago
>    Assignee: Karl Wright
>Priority: Major
> Attachments: Manifold connection.png
>
>
> Hello,
> We are trying to connct to a Sharepoint 2016 site wich has default 
> installation. The URL is 
> [http://precogwin02/sites/UKAEAtestSP2016/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx]
>  and from a browser it is fully operational. The site is installed on our 
> local network, no firewall issues could be. 
> When we try to connect from the Manifold CF we get this error message: "The 
> site at 
> [http://manifoldsharepoint/sites/UKAEAtestSP2016|http://manifoldsharepoint/sites/UKAEAtestSP2016/Shared%20Documents]
>  did not exist or was external; skipping"
> This Manifold installation is able to connect to a Windows share on the same 
> server, so we think no user/pass Active Directory, etc issues could be here. 
> We checked forums, documentations but found no solution. 
>  
> Is there any special setting needed in Manifold, Sharepoint, et.? 
>  
> Thank you in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-15 Thread Karl Wright
Hi,

Mcf currently requires jdk8.  Jdk11 is non trivial to support because of
the removal of many jdk classes connectors need.  It will be ported at some
point but not lightly.

Similarly, disabling SSL would certainly break many installations upon
upgrade  and we do not do that lightly.

The core methods that mcf supplies its connectors should therefore be
updated to support but not mandate tls.  The protocol specification one
gives to sslcontext is not a detailed one but rather a major version.  What
I don't know is whether"tlsv1" also allows for older protocols etc.

Karl

On Wed, Jan 15, 2020, 1:19 AM Jörn Franke  wrote:

> Yes I am doing that but I will need to rebuild.
> I don’t recommend TLSv1 - this is already outphased and will lock out
> TLSv1.2. I try TLS only as it includes all TLS protocols (depends on JDK).
>
> SSL will not be supported by this (however as I said there are other parts
> of the code where there is a getInstance(TLS). And some caveats: On JDK6+7
> TLS only means TLSv1 (and newer TLS Protocols are deactivated) on JDK8 it
> means also that newer TLS protocols are enabled.
> To be honest in my opinion - a SSL only one is a significant security hole
> and given how old TLS support is JDK i would be surprised if there is
> someone using such a server (most Organisations should switch to TLSv1.2 in
> any case as all protocols below have been broken).
> While it works for all JDKs - probably JDK8 should be recommended as it
> seems to have all TLS protocols activated when using „TLS“. Older JDKs seem
> to deactivate TLSv1.1 and TLSv1.2 when using TLS. I will write more about
> this in the JIRA, once I verified that this solves the problem.
> Then TLSv1.3 is JDK11 only - I will investigate what that implies.
> Does ManifoldCf supports JDK11?
>
> Am 15.01.2020 um 00:08 schrieb Karl Wright :
>
> 
> I think you can just change the code to read as follows when it creates
> the SSLContext:
>
> SSLContext ctx = SSLContext.getInstance("TLSv1");
> I don't know if TLS will downgrade to SSL if that's all that's available.
>
>
> Karl
>
>
>
> On Tue, Jan 14, 2020 at 6:02 PM Jörn Franke  wrote:
>
>> Yes it you do not change this setting as what I suspect happens here. See
>> my previous mail for details.
>>
>> Am 14.01.2020 um 23:51 schrieb Karl Wright :
>>
>> 
>> It looks looks TLS is actually enabled in the SSLSocketFactory framework
>> based on how you create the SSLSocketContext.  See:
>>
>> https://docs.oracle.com/cd/E19698-01/816-7609/security-83/index.html
>>
>> Karl
>>
>>
>> On Tue, Jan 14, 2020 at 5:48 PM Karl Wright  wrote:
>>
>>> The design of ManifoldCF deliberately manages keystores on a connection
>>> by connection basis, not globally.  If you think the only way to implement
>>> TLS is via global keystore I very much doubt it.
>>>
>>> I am on the road until late tomorrow but somewhere along the line I can
>>> do some research into why TLS won't work as we are currently doing it.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Jan 14, 2020 at 12:56 PM Jörn Franke 
>>> wrote:
>>>
>>>> These are TLS only. So maybe you have other servers where tls and ssl
>>>> are possible and it downgrades to ssl.however, this is speculation and I
>>>> need to verify it. I have to rebuilt manifold for that. Probably I have to
>>>> reinstall everything as the keystorefactory is a dependency in the
>>>> connector.
>>>>
>>>> Am 14.01.2020 um 18:34 schrieb Karl Wright :
>>>>
>>>> 
>>>> If you can recommend changes to support TLS, that would be great.  The
>>>> basic infrastructure should still work; it is just a custom keystone and
>>>> associated SSLSocketFactory, which I think also is used for TLS
>>>> connections, unless I am missing something.
>>>>
>>>> On Tue, Jan 14, 2020, 9:38 AM Jörn Franke  wrote:
>>>>
>>>>> Yes this works fine. I believe the error comes from the fact that TLS
>>>>> connections are not supported.
>>>>>
>>>>> Am 14.01.2020 um 15:31 schrieb Michael Cizmar <
>>>>> michael.ciz...@mcplusa.com>:
>>>>>
>>>>> 
>>>>>
>>>>> If you want to test the url and the ssl, I would recommend attempting
>>>>> using SSLPoke to confirm that they keystore is setup properly:
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/MichalHecko/SSLPoke
>>>>>
>

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-14 Thread Karl Wright
I think you can just change the code to read as follows when it creates the
SSLContext:

SSLContext ctx = SSLContext.getInstance("TLSv1");
I don't know if TLS will downgrade to SSL if that's all that's available.


Karl



On Tue, Jan 14, 2020 at 6:02 PM Jörn Franke  wrote:

> Yes it you do not change this setting as what I suspect happens here. See
> my previous mail for details.
>
> Am 14.01.2020 um 23:51 schrieb Karl Wright :
>
> 
> It looks looks TLS is actually enabled in the SSLSocketFactory framework
> based on how you create the SSLSocketContext.  See:
>
> https://docs.oracle.com/cd/E19698-01/816-7609/security-83/index.html
>
> Karl
>
>
> On Tue, Jan 14, 2020 at 5:48 PM Karl Wright  wrote:
>
>> The design of ManifoldCF deliberately manages keystores on a connection
>> by connection basis, not globally.  If you think the only way to implement
>> TLS is via global keystore I very much doubt it.
>>
>> I am on the road until late tomorrow but somewhere along the line I can
>> do some research into why TLS won't work as we are currently doing it.
>>
>> Karl
>>
>>
>> On Tue, Jan 14, 2020 at 12:56 PM Jörn Franke 
>> wrote:
>>
>>> These are TLS only. So maybe you have other servers where tls and ssl
>>> are possible and it downgrades to ssl.however, this is speculation and I
>>> need to verify it. I have to rebuilt manifold for that. Probably I have to
>>> reinstall everything as the keystorefactory is a dependency in the
>>> connector.
>>>
>>> Am 14.01.2020 um 18:34 schrieb Karl Wright :
>>>
>>> 
>>> If you can recommend changes to support TLS, that would be great.  The
>>> basic infrastructure should still work; it is just a custom keystone and
>>> associated SSLSocketFactory, which I think also is used for TLS
>>> connections, unless I am missing something.
>>>
>>> On Tue, Jan 14, 2020, 9:38 AM Jörn Franke  wrote:
>>>
>>>> Yes this works fine. I believe the error comes from the fact that TLS
>>>> connections are not supported.
>>>>
>>>> Am 14.01.2020 um 15:31 schrieb Michael Cizmar <
>>>> michael.ciz...@mcplusa.com>:
>>>>
>>>> 
>>>>
>>>> If you want to test the url and the ssl, I would recommend attempting
>>>> using SSLPoke to confirm that they keystore is setup properly:
>>>>
>>>>
>>>>
>>>> https://github.com/MichalHecko/SSLPoke
>>>>
>>>>
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> *From: *Karl Wright 
>>>> *Reply-To: *"user@manifoldcf.apache.org" 
>>>> *Date: *Tuesday, January 14, 2020 at 7:21 AM
>>>> *To: *"user@manifoldcf.apache.org" 
>>>> *Subject: *Re: CSWS Connector : ServiceConstructionException: Failed
>>>> to create service
>>>>
>>>>
>>>>
>>>> Hmm, others have succeeded setting up SSL connections with the current
>>>> code.  Hoping they chime in here.
>>>>
>>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Jan 14, 2020, 8:19 AM Jörn Franke  wrote:
>>>>
>>>> It seems that it has indeed a certificate issue as it cannot find a
>>>> valid certification path to the target. The thing is: I added those
>>>> certificates in the UI should it should not happen.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am 10.01.2020 um 20:51 schrieb Jörn Franke :
>>>>
>>>> 2.15 ...
>>>>
>>>> I will try on the weekend to see if I can get some logs out of it.
>>>>
>>>>
>>>>
>>>> Am 10.01.2020 um 19:02 schrieb Karl Wright :
>>>>
>>>> Can I ask what version of MCF you are using?  There were issues with
>>>> SSL in the first release of the csws connector if I recall correctly, that
>>>> were fixed for the second release.
>>>>
>>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke 
>>>> wrote:
>>>>
>>>> I added root, intermediate and server certificate (in base64 cer, it
>>>> seems to be recognized by manifoldcf), but I still get the same message. I
>>>> will t

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-14 Thread Karl Wright
The design of ManifoldCF deliberately manages keystores on a connection by
connection basis, not globally.  If you think the only way to implement TLS
is via global keystore I very much doubt it.

I am on the road until late tomorrow but somewhere along the line I can do
some research into why TLS won't work as we are currently doing it.

Karl


On Tue, Jan 14, 2020 at 12:56 PM Jörn Franke  wrote:

> These are TLS only. So maybe you have other servers where tls and ssl are
> possible and it downgrades to ssl.however, this is speculation and I need
> to verify it. I have to rebuilt manifold for that. Probably I have to
> reinstall everything as the keystorefactory is a dependency in the
> connector.
>
> Am 14.01.2020 um 18:34 schrieb Karl Wright :
>
> 
> If you can recommend changes to support TLS, that would be great.  The
> basic infrastructure should still work; it is just a custom keystone and
> associated SSLSocketFactory, which I think also is used for TLS
> connections, unless I am missing something.
>
> On Tue, Jan 14, 2020, 9:38 AM Jörn Franke  wrote:
>
>> Yes this works fine. I believe the error comes from the fact that TLS
>> connections are not supported.
>>
>> Am 14.01.2020 um 15:31 schrieb Michael Cizmar > >:
>>
>> 
>>
>> If you want to test the url and the ssl, I would recommend attempting
>> using SSLPoke to confirm that they keystore is setup properly:
>>
>>
>>
>> https://github.com/MichalHecko/SSLPoke
>>
>>
>>
>> Michael
>>
>>
>>
>> *From: *Karl Wright 
>> *Reply-To: *"user@manifoldcf.apache.org" 
>> *Date: *Tuesday, January 14, 2020 at 7:21 AM
>> *To: *"user@manifoldcf.apache.org" 
>> *Subject: *Re: CSWS Connector : ServiceConstructionException: Failed to
>> create service
>>
>>
>>
>> Hmm, others have succeeded setting up SSL connections with the current
>> code.  Hoping they chime in here.
>>
>>
>>
>> Karl
>>
>>
>>
>> On Tue, Jan 14, 2020, 8:19 AM Jörn Franke  wrote:
>>
>> It seems that it has indeed a certificate issue as it cannot find a valid
>> certification path to the target. The thing is: I added those certificates
>> in the UI should it should not happen.
>>
>>
>>
>>
>>
>>
>>
>> Am 10.01.2020 um 20:51 schrieb Jörn Franke :
>>
>> 2.15 ...
>>
>> I will try on the weekend to see if I can get some logs out of it.
>>
>>
>>
>> Am 10.01.2020 um 19:02 schrieb Karl Wright :
>>
>> Can I ask what version of MCF you are using?  There were issues with SSL
>> in the first release of the csws connector if I recall correctly, that were
>> fixed for the second release.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke 
>> wrote:
>>
>> I added root, intermediate and server certificate (in base64 cer, it
>> seems to be recognized by manifoldcf), but I still get the same message. I
>> will try to get somehow the full stacktrace
>>
>>
>>
>> Am 10.01.2020 um 17:21 schrieb Karl Wright :
>>
>> If you are using SSL you need to have the proper certificate saved in the
>> connection's keystore.
>>
>> Karl
>>
>>
>>
>>
>>
>> On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke 
>> wrote:
>>
>> It is actually a server using configuration of the command - driven
>> multi-process model (but the agents executed as a service and the war on a
>> tomcat executed as a service) under Linux.
>>
>>
>>
>> I thought as well that it cannot reach the webservices, the question is
>> why. On the same server I can reach the webservices and fetch the WSDL
>> without issues.
>>
>> Maybe sth related to ssl ?
>>
>>
>>
>> Am 10.01.2020 um 14:59 schrieb Karl Wright :
>>
>> How are you running manifoldcf?  Single process example, or a custom
>> setup of some kind?
>>
>> This exception is a "catch all" exception generated far below anything in
>> ManifoldCF, but usually means it cannot download the WSDLs from the
>> service.  Getting the full exception dumped in the log requires a "hack" to
>> the check() method of the connector, but I'm pretty sure that's what's
>> happening anyway.
>>
>> Karl
>>
>>
>>
>>
>>
>> On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke  wrote:
>>
>> Hi,
>>
>> I tried to use the CSWS connector, but already for the Authority
>> connection I receive a
>> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
>> create service.
>>
>> Unfortunately I don’t see more details , also not in the log (debug is
>> activated). I try to get a little bit more output by modifying the
>> connector, but maybe someone has already an idea why this can happen?
>>
>> Are there some special instructions to use it? The pointers to the
>> webservices are correct, I tested via Curl and SOAPUI.
>>
>>
>> Thank you.
>> Best regards
>>
>>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-14 Thread Karl Wright
It looks looks TLS is actually enabled in the SSLSocketFactory framework
based on how you create the SSLSocketContext.  See:

https://docs.oracle.com/cd/E19698-01/816-7609/security-83/index.html

Karl


On Tue, Jan 14, 2020 at 5:48 PM Karl Wright  wrote:

> The design of ManifoldCF deliberately manages keystores on a connection by
> connection basis, not globally.  If you think the only way to implement TLS
> is via global keystore I very much doubt it.
>
> I am on the road until late tomorrow but somewhere along the line I can do
> some research into why TLS won't work as we are currently doing it.
>
> Karl
>
>
> On Tue, Jan 14, 2020 at 12:56 PM Jörn Franke  wrote:
>
>> These are TLS only. So maybe you have other servers where tls and ssl are
>> possible and it downgrades to ssl.however, this is speculation and I need
>> to verify it. I have to rebuilt manifold for that. Probably I have to
>> reinstall everything as the keystorefactory is a dependency in the
>> connector.
>>
>> Am 14.01.2020 um 18:34 schrieb Karl Wright :
>>
>> 
>> If you can recommend changes to support TLS, that would be great.  The
>> basic infrastructure should still work; it is just a custom keystone and
>> associated SSLSocketFactory, which I think also is used for TLS
>> connections, unless I am missing something.
>>
>> On Tue, Jan 14, 2020, 9:38 AM Jörn Franke  wrote:
>>
>>> Yes this works fine. I believe the error comes from the fact that TLS
>>> connections are not supported.
>>>
>>> Am 14.01.2020 um 15:31 schrieb Michael Cizmar <
>>> michael.ciz...@mcplusa.com>:
>>>
>>> 
>>>
>>> If you want to test the url and the ssl, I would recommend attempting
>>> using SSLPoke to confirm that they keystore is setup properly:
>>>
>>>
>>>
>>> https://github.com/MichalHecko/SSLPoke
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> *From: *Karl Wright 
>>> *Reply-To: *"user@manifoldcf.apache.org" 
>>> *Date: *Tuesday, January 14, 2020 at 7:21 AM
>>> *To: *"user@manifoldcf.apache.org" 
>>> *Subject: *Re: CSWS Connector : ServiceConstructionException: Failed to
>>> create service
>>>
>>>
>>>
>>> Hmm, others have succeeded setting up SSL connections with the current
>>> code.  Hoping they chime in here.
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020, 8:19 AM Jörn Franke  wrote:
>>>
>>> It seems that it has indeed a certificate issue as it cannot find a
>>> valid certification path to the target. The thing is: I added those
>>> certificates in the UI should it should not happen.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Am 10.01.2020 um 20:51 schrieb Jörn Franke :
>>>
>>> 2.15 ...
>>>
>>> I will try on the weekend to see if I can get some logs out of it.
>>>
>>>
>>>
>>> Am 10.01.2020 um 19:02 schrieb Karl Wright :
>>>
>>> Can I ask what version of MCF you are using?  There were issues with SSL
>>> in the first release of the csws connector if I recall correctly, that were
>>> fixed for the second release.
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke 
>>> wrote:
>>>
>>> I added root, intermediate and server certificate (in base64 cer, it
>>> seems to be recognized by manifoldcf), but I still get the same message. I
>>> will try to get somehow the full stacktrace
>>>
>>>
>>>
>>> Am 10.01.2020 um 17:21 schrieb Karl Wright :
>>>
>>> If you are using SSL you need to have the proper certificate saved in
>>> the connection's keystore.
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke 
>>> wrote:
>>>
>>> It is actually a server using configuration of the command - driven
>>> multi-process model (but the agents executed as a service and the war on a
>>> tomcat executed as a service) under Linux.
>>>
>>>
>>>
>>> I thought as well that it cannot reach the webservices, the question is
>>> why. On the same server I can reach the webservices and fetch the WSDL
>>> without issues.
>>>
>

Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-14 Thread Karl Wright
If you can recommend changes to support TLS, that would be great.  The
basic infrastructure should still work; it is just a custom keystone and
associated SSLSocketFactory, which I think also is used for TLS
connections, unless I am missing something.

On Tue, Jan 14, 2020, 9:38 AM Jörn Franke  wrote:

> Yes this works fine. I believe the error comes from the fact that TLS
> connections are not supported.
>
> Am 14.01.2020 um 15:31 schrieb Michael Cizmar  >:
>
> 
>
> If you want to test the url and the ssl, I would recommend attempting
> using SSLPoke to confirm that they keystore is setup properly:
>
>
>
> https://github.com/MichalHecko/SSLPoke
>
>
>
> Michael
>
>
>
> *From: *Karl Wright 
> *Reply-To: *"user@manifoldcf.apache.org" 
> *Date: *Tuesday, January 14, 2020 at 7:21 AM
> *To: *"user@manifoldcf.apache.org" 
> *Subject: *Re: CSWS Connector : ServiceConstructionException: Failed to
> create service
>
>
>
> Hmm, others have succeeded setting up SSL connections with the current
> code.  Hoping they chime in here.
>
>
>
> Karl
>
>
>
> On Tue, Jan 14, 2020, 8:19 AM Jörn Franke  wrote:
>
> It seems that it has indeed a certificate issue as it cannot find a valid
> certification path to the target. The thing is: I added those certificates
> in the UI should it should not happen.
>
>
>
>
>
>
>
> Am 10.01.2020 um 20:51 schrieb Jörn Franke :
>
> 2.15 ...
>
> I will try on the weekend to see if I can get some logs out of it.
>
>
>
> Am 10.01.2020 um 19:02 schrieb Karl Wright :
>
> Can I ask what version of MCF you are using?  There were issues with SSL
> in the first release of the csws connector if I recall correctly, that were
> fixed for the second release.
>
>
>
> Karl
>
>
>
>
>
> On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke  wrote:
>
> I added root, intermediate and server certificate (in base64 cer, it seems
> to be recognized by manifoldcf), but I still get the same message. I will
> try to get somehow the full stacktrace
>
>
>
> Am 10.01.2020 um 17:21 schrieb Karl Wright :
>
> If you are using SSL you need to have the proper certificate saved in the
> connection's keystore.
>
> Karl
>
>
>
>
>
> On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke  wrote:
>
> It is actually a server using configuration of the command - driven
> multi-process model (but the agents executed as a service and the war on a
> tomcat executed as a service) under Linux.
>
>
>
> I thought as well that it cannot reach the webservices, the question is
> why. On the same server I can reach the webservices and fetch the WSDL
> without issues.
>
> Maybe sth related to ssl ?
>
>
>
> Am 10.01.2020 um 14:59 schrieb Karl Wright :
>
> How are you running manifoldcf?  Single process example, or a custom setup
> of some kind?
>
> This exception is a "catch all" exception generated far below anything in
> ManifoldCF, but usually means it cannot download the WSDLs from the
> service.  Getting the full exception dumped in the log requires a "hack" to
> the check() method of the connector, but I'm pretty sure that's what's
> happening anyway.
>
> Karl
>
>
>
>
>
> On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke  wrote:
>
> Hi,
>
> I tried to use the CSWS connector, but already for the Authority
> connection I receive a
> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
> create service.
>
> Unfortunately I don’t see more details , also not in the log (debug is
> activated). I try to get a little bit more output by modifying the
> connector, but maybe someone has already an idea why this can happen?
>
> Are there some special instructions to use it? The pointers to the
> webservices are correct, I tested via Curl and SOAPUI.
>
>
> Thank you.
> Best regards
>
>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-14 Thread Karl Wright
Hmm, others have succeeded setting up SSL connections with the current
code.  Hoping they chime in here.

Karl

On Tue, Jan 14, 2020, 8:19 AM Jörn Franke  wrote:

> It seems that it has indeed a certificate issue as it cannot find a valid
> certification path to the target. The thing is: I added those certificates
> in the UI should it should not happen.
>
>
>
> Am 10.01.2020 um 20:51 schrieb Jörn Franke :
>
> 
> 2.15 ...
> I will try on the weekend to see if I can get some logs out of it.
>
> Am 10.01.2020 um 19:02 schrieb Karl Wright :
>
> 
> Can I ask what version of MCF you are using?  There were issues with SSL
> in the first release of the csws connector if I recall correctly, that were
> fixed for the second release.
>
> Karl
>
>
> On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke  wrote:
>
>> I added root, intermediate and server certificate (in base64 cer, it
>> seems to be recognized by manifoldcf), but I still get the same message. I
>> will try to get somehow the full stacktrace
>>
>> Am 10.01.2020 um 17:21 schrieb Karl Wright :
>>
>> 
>> If you are using SSL you need to have the proper certificate saved in the
>> connection's keystore.
>> Karl
>>
>>
>> On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke 
>> wrote:
>>
>>> It is actually a server using configuration of the command - driven
>>> multi-process model (but the agents executed as a service and the war on a
>>> tomcat executed as a service) under Linux.
>>>
>>> I thought as well that it cannot reach the webservices, the question is
>>> why. On the same server I can reach the webservices and fetch the WSDL
>>> without issues.
>>> Maybe sth related to ssl ?
>>>
>>> Am 10.01.2020 um 14:59 schrieb Karl Wright :
>>>
>>> 
>>> How are you running manifoldcf?  Single process example, or a custom
>>> setup of some kind?
>>>
>>> This exception is a "catch all" exception generated far below anything
>>> in ManifoldCF, but usually means it cannot download the WSDLs from the
>>> service.  Getting the full exception dumped in the log requires a "hack" to
>>> the check() method of the connector, but I'm pretty sure that's what's
>>> happening anyway.
>>>
>>> Karl
>>>
>>>
>>> On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried to use the CSWS connector, but already for the Authority
>>>> connection I receive a
>>>> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
>>>> create service.
>>>>
>>>> Unfortunately I don’t see more details , also not in the log (debug is
>>>> activated). I try to get a little bit more output by modifying the
>>>> connector, but maybe someone has already an idea why this can happen?
>>>>
>>>> Are there some special instructions to use it? The pointers to the
>>>> webservices are correct, I tested via Curl and SOAPUI.
>>>>
>>>>
>>>> Thank you.
>>>> Best regards
>>>
>>>


Re: performInsert() postgres json

2020-01-13 Thread Karl Wright
If you can just put JSON in a string that would work fine.

Karl


On Mon, Jan 13, 2020 at 3:36 AM SREEJITH va  wrote:

> Thanks Karl.
>
> I am trying to use WrappedConnection which I am getting through below API
> and using it for database operations in my connector.
>
>
> *ConnectionFactory.getConnection(jdbcURL, _driver,
> ManifoldCF.getMasterDatabaseName(),
> ManifoldCF.getMasterDatabaseUsername(), 
> ManifoldCF.getMasterDatabasePassword(),
> maxDBConnections, false);*
>
> In this way I can run the query directly and overcome the issue with json
> datatype. Is it ok to proceed with this or I should not directly use
> wrapper connections ?. Any performance concerns using wrapperconnections
> manually for connector instead of extending BaseTable.
>
>
> On Thu, Jan 2, 2020 at 7:26 PM Karl Wright  wrote:
>
>> The Basetable abstraction doesn't recognize specialty column types like
>> JSON; it's got a limited set of types it knows about, and that is by design
>> so multiple implementations can be written for different databases.
>>
>> Karl
>>
>>
>> On Thu, Jan 2, 2020 at 8:49 AM SREEJITH va  wrote:
>>
>>> Hi Karl and Team,
>>>
>>> I have a situation  where I have to call *performInsert(parameterMap,
>>> null)*  on a postgres database table with json column. I am getting
>>> below error during the insert.
>>>
>>>
>>> *column "X" is of type json but expression is of type character
>>> varying  Hint: You will need to rewrite or cast the expression.*
>>>
>>> Is there any way I can achieve this using Basetable api?
>>>
>>>
>>> --
>>> Regards
>>> -Sreejith
>>>
>>
>
> --
> Regards
> -Sreejith
>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-10 Thread Karl Wright
Can I ask what version of MCF you are using?  There were issues with SSL in
the first release of the csws connector if I recall correctly, that were
fixed for the second release.

Karl


On Fri, Jan 10, 2020 at 11:42 AM Jörn Franke  wrote:

> I added root, intermediate and server certificate (in base64 cer, it seems
> to be recognized by manifoldcf), but I still get the same message. I will
> try to get somehow the full stacktrace
>
> Am 10.01.2020 um 17:21 schrieb Karl Wright :
>
> 
> If you are using SSL you need to have the proper certificate saved in the
> connection's keystore.
> Karl
>
>
> On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke  wrote:
>
>> It is actually a server using configuration of the command - driven
>> multi-process model (but the agents executed as a service and the war on a
>> tomcat executed as a service) under Linux.
>>
>> I thought as well that it cannot reach the webservices, the question is
>> why. On the same server I can reach the webservices and fetch the WSDL
>> without issues.
>> Maybe sth related to ssl ?
>>
>> Am 10.01.2020 um 14:59 schrieb Karl Wright :
>>
>> 
>> How are you running manifoldcf?  Single process example, or a custom
>> setup of some kind?
>>
>> This exception is a "catch all" exception generated far below anything in
>> ManifoldCF, but usually means it cannot download the WSDLs from the
>> service.  Getting the full exception dumped in the log requires a "hack" to
>> the check() method of the connector, but I'm pretty sure that's what's
>> happening anyway.
>>
>> Karl
>>
>>
>> On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke  wrote:
>>
>>> Hi,
>>>
>>> I tried to use the CSWS connector, but already for the Authority
>>> connection I receive a
>>> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
>>> create service.
>>>
>>> Unfortunately I don’t see more details , also not in the log (debug is
>>> activated). I try to get a little bit more output by modifying the
>>> connector, but maybe someone has already an idea why this can happen?
>>>
>>> Are there some special instructions to use it? The pointers to the
>>> webservices are correct, I tested via Curl and SOAPUI.
>>>
>>>
>>> Thank you.
>>> Best regards
>>
>>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-10 Thread Karl Wright
If you are using SSL you need to have the proper certificate saved in the
connection's keystore.
Karl


On Fri, Jan 10, 2020 at 11:20 AM Jörn Franke  wrote:

> It is actually a server using configuration of the command - driven
> multi-process model (but the agents executed as a service and the war on a
> tomcat executed as a service) under Linux.
>
> I thought as well that it cannot reach the webservices, the question is
> why. On the same server I can reach the webservices and fetch the WSDL
> without issues.
> Maybe sth related to ssl ?
>
> Am 10.01.2020 um 14:59 schrieb Karl Wright :
>
> 
> How are you running manifoldcf?  Single process example, or a custom setup
> of some kind?
>
> This exception is a "catch all" exception generated far below anything in
> ManifoldCF, but usually means it cannot download the WSDLs from the
> service.  Getting the full exception dumped in the log requires a "hack" to
> the check() method of the connector, but I'm pretty sure that's what's
> happening anyway.
>
> Karl
>
>
> On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke  wrote:
>
>> Hi,
>>
>> I tried to use the CSWS connector, but already for the Authority
>> connection I receive a
>> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
>> create service.
>>
>> Unfortunately I don’t see more details , also not in the log (debug is
>> activated). I try to get a little bit more output by modifying the
>> connector, but maybe someone has already an idea why this can happen?
>>
>> Are there some special instructions to use it? The pointers to the
>> webservices are correct, I tested via Curl and SOAPUI.
>>
>>
>> Thank you.
>> Best regards
>
>


Re: CSWS Connector : ServiceConstructionException: Failed to create service

2020-01-10 Thread Karl Wright
How are you running manifoldcf?  Single process example, or a custom setup
of some kind?

This exception is a "catch all" exception generated far below anything in
ManifoldCF, but usually means it cannot download the WSDLs from the
service.  Getting the full exception dumped in the log requires a "hack" to
the check() method of the connector, but I'm pretty sure that's what's
happening anyway.

Karl


On Fri, Jan 10, 2020 at 8:50 AM Jörn Franke  wrote:

> Hi,
>
> I tried to use the CSWS connector, but already for the Authority
> connection I receive a
> org.apache.cxf.service.factory.ServiceConstructionException: Failed to
> create service.
>
> Unfortunately I don’t see more details , also not in the log (debug is
> activated). I try to get a little bit more output by modifying the
> connector, but maybe someone has already an idea why this can happen?
>
> Are there some special instructions to use it? The pointers to the
> webservices are correct, I tested via Curl and SOAPUI.
>
>
> Thank you.
> Best regards


Re: Oracle JDBC Job Error

2020-01-06 Thread Karl Wright
Hi Cihad,

You need to change the query.  The code is perfectly fine.  Follow the
instructions: Try using quotes around $(IDCOLUMN) variable, e.g. \"
$(IDCOLUMN)\" .
Your query should look like this:

SELECT PERSONID AS "$(IDCOLUMN)" FROM PERSON

Karl

On Mon, Jan 6, 2020 at 7:50 PM Cihad Guzel  wrote:

> Hi,
>
> I have debugged the MFC-2.15 codes and caught the problem.
>
> JDBCConnector.java line:270
>
> Object o = row.getValue(JDBCConstants.idReturnColumnName);
>
> if (o == null)
>   throw new ManifoldCFException("Bad seed query; doesn't return $(IDCOLUMN) 
> column.  Try using quotes around $(IDCOLUMN) variable, e.g. \"$(IDCOLUMN)\", 
> or, for MySQL, select \"by label\" in your repository connection.");
>
>
> The "row" object's value is "LCF__ID" -> this is a uppercase string
>
> "JDBCConstants.idReturnColumnName" is "lcf__id" -> this is a lowercase string
>
> So "o" object is null.
>
> I think that Oracle returns the uppercase column name. It is not a bug. How 
> can I fix it? Should I update the seed query in the Query tab? Should we 
> change the code lines?
>
> Regards,
> Cihad Guzel
>
>
> Cihad Guzel , 5 Oca 2020 Paz, 20:14 tarihinde şunu
> yazdı:
>
>> Hi,
>>
>> I try JDBC connector with Oracle (version: 11.2.0.4). I added to
>> classpath ojdbc6.jar. My seed query as follows:
>>
>> "SELECT PERSONID AS $(IDCOLUMN) FROM PERSON"
>>
>> and I have an error as follow:
>>
>> "Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using
>> quotes around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)", or, for MySQL,
>> select "by label" in your repository connection."
>>
>> I have tried JDBC connector with MsSQL and Mysql. It has run successfully.
>>
>> How can I fix it?
>>
>> Regards,
>> Cihad Guzel
>>
>


Re: Oracle JDBC Job Error

2020-01-06 Thread Karl Wright
Hi Cihad,

You need to change the query.  The code is perfectly fine.  Follow the
instructions: Try using quotes around $(IDCOLUMN) variable, e.g. \"
$(IDCOLUMN)\" .
Your query should look like this:

SELECT PERSONID AS "$(IDCOLUMN)" FROM PERSON

Karl

On Mon, Jan 6, 2020 at 7:50 PM Cihad Guzel  wrote:

> Hi,
>
> I have debugged the MFC-2.15 codes and caught the problem.
>
> JDBCConnector.java line:270
>
> Object o = row.getValue(JDBCConstants.idReturnColumnName);
>
> if (o == null)
>   throw new ManifoldCFException("Bad seed query; doesn't return $(IDCOLUMN) 
> column.  Try using quotes around $(IDCOLUMN) variable, e.g. \"$(IDCOLUMN)\", 
> or, for MySQL, select \"by label\" in your repository connection.");
>
>
> The "row" object's value is "LCF__ID" -> this is a uppercase string
>
> "JDBCConstants.idReturnColumnName" is "lcf__id" -> this is a lowercase string
>
> So "o" object is null.
>
> I think that Oracle returns the uppercase column name. It is not a bug. How 
> can I fix it? Should I update the seed query in the Query tab? Should we 
> change the code lines?
>
> Regards,
> Cihad Guzel
>
>
> Cihad Guzel , 5 Oca 2020 Paz, 20:14 tarihinde şunu
> yazdı:
>
>> Hi,
>>
>> I try JDBC connector with Oracle (version: 11.2.0.4). I added to
>> classpath ojdbc6.jar. My seed query as follows:
>>
>> "SELECT PERSONID AS $(IDCOLUMN) FROM PERSON"
>>
>> and I have an error as follow:
>>
>> "Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using
>> quotes around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)", or, for MySQL,
>> select "by label" in your repository connection."
>>
>> I have tried JDBC connector with MsSQL and Mysql. It has run successfully.
>>
>> How can I fix it?
>>
>> Regards,
>> Cihad Guzel
>>
>


Re: Tika Extractor - extract document as (X)HTML not as textonly

2020-01-03 Thread Karl Wright
The reason plain text is used is because otherwise standard text processing
inside Lucene will index tags as terms, which is definitely not what you
usually want.

If you want the Tika Extractor to be able to optionally generate an XHTML
format, that sounds like an additional operating mode for the Tika
Extractor.  To do that you'd need to add a flag, probably to the Output
Specification, with associated UI components, and be sure to maintain
backwards compatibility.

Karl


On Thu, Jan 2, 2020 at 4:30 PM Jörn Franke  wrote:

> Hi,
>
> Is there a possibility to have instead of the text output in the Tika
> Extractor (Manifold version, not the extract handler) the (X)HTML output?
> How one can achieve this in Tika is pretty clear:
> https://tika.apache.org/1.8/examples.html#Picking_different_output_formats
>
> Reason: We need to extract very specific chapters from a word document and
> index them as dedicated Solr documents (the latter part is probably still
> to be done in an update chain).  There we currently already extract from
> the HTML version created by Tika of the word document the (sub-)chapters we
> need.
>
> thank you.
>
> best regards
>


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-03 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007319#comment-17007319
 ] 

Karl Wright commented on CONNECTORS-1629:
-

Hi,

{quote}
About the ModifiedSolrClient - do I understand you correctly that you would 
prefer to make the ModifiedSolrClient working in this setting as well? Ie by 
creating a new ModifiedSolrClientKerberos and ModifiedLBSolrClientKerberos (not 
touching the ones already in Manifold)? I can look at this, but I wonder if 
this would still be needed as I did not observe any errors. Maybe the multipart 
bit is fixed in higher Solr versions?
{quote}

I wish the multipart code was fixed but I fear it is not; I tried to get the 
HttpClient team to agree to it but there was disagreement and I didn't get past 
that.  It's so long ago now that I don't even remember the discussion well, but 
some team members thought that it was not the client's responsibility to 
properly escape argument names when they were encoded in some cases but not in 
others.  If you are including metadata names and values that would require 
encoding and this is working OK, then maybe this was resolved.  But we should 
evaluate that independently.

The multipart fix was only PART of the reason for ModifiedSolrHttpClient, 
however.  The other reason was that the Solr team essentially deprecated and 
removed support for multipart posts entirely, which meant that streaming of 
large documents to solr was not possible.  I've kept that working and called 
for them to rethink that problem, at which point I was told that nobody should 
be using Solr Cell at all (!)  So that stays until the Solr team figures this 
out.  The conversation there was at least relatively recent.

A github pull is fine.  A diff gets generated by attaching a ".diff" to the URL 
and then I can patch in svn.






> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Priority: Major
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-01-02 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007086#comment-17007086
 ] 

Karl Wright commented on CONNECTORS-1629:
-

Hi,

I suggest we make changes piecemeal.  First, updating the Jetty version, and 
the jars that are included, as described here:

{quote}
You need jetty-client-9.4.25.v20191220.jar (maybe a slightly older 9.4.x 
version will do as well, the current manifold version not). Reason is that you 
will get otherwise a java.lang.ClassNotFoundException: 
org.eclipse.jetty.client.util.SPNEGOAuthentication error.

I was not exactly sure how to add this jar to the finally generated 
distribution of ManifoldCF so i copied it in collector-lib and added it to the 
classpath.
{quote}

To do this, we'd want to update the version of jetty specified in build.xml and 
pom.xml, and add the new jar to the jetty jar list in build.xml.  Then, in 
framework/build.xml, the new jar should be added wherever jetty jars are found.

{quote}
I had to also deactivate the ModifiedLbSolrClient (commented out below) 
otherwise you get an auth error 401. I believe the reason is that the default 
SPNEGO Protocol for HTTP Kerberos always returns 401 not auth and THEN you are 
supposed to do the Kerberos authentication, which is what SolrJ does
{quote}

The modified client is present because we need to be sure that the correct 
(overridden) version of the SolrHttpClient class is used, not the default one.  
So in this case you'd want to create a fresh copy of LBSolrClient and modify it 
accordingly.

{quote}
Finally, you need to add to options.env.unix or options.env.win:

-Djava.security.auth.login.config=/path/to/jaas-client.conf
{quote}

I would suggest adding both the config file and the -D switch to all the 
examples, but leave kerberos disabled unless somebody modifies the 
jaas-client.conf file.



> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Solr 7.x component
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Priority: Major
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: performInsert() postgres json

2020-01-02 Thread Karl Wright
The Basetable abstraction doesn't recognize specialty column types like
JSON; it's got a limited set of types it knows about, and that is by design
so multiple implementations can be written for different databases.

Karl


On Thu, Jan 2, 2020 at 8:49 AM SREEJITH va  wrote:

> Hi Karl and Team,
>
> I have a situation  where I have to call *performInsert(parameterMap,
> null)*  on a postgres database table with json column. I am getting below
> error during the insert.
>
>
> *column "X" is of type json but expression is of type character
> varying  Hint: You will need to rewrite or cast the expression.*
>
> Is there any way I can achieve this using Basetable api?
>
>
> --
> Regards
> -Sreejith
>


Re: PostgreSQL Version

2019-12-30 Thread Karl Wright
I've personally upgraded to 9.6 and have found no issues with it.
Karl


On Mon, Dec 30, 2019 at 1:41 PM SREEJITH va  wrote:

> Hi Karl / Team ,
>
> As per the documentation, Below the PostgreSQL versions tested against the
> Manifold.
>
> *release-1.10*:  ManifoldCF has been tested against version 8.3.7, 8.4.5
> and 9.1 of PostgreSQL
> *release-2.15* : ManifoldCF has been tested against version 8.3.7, 8.4.5,
> 9.1, 9.2, and 9.3 of PostgreSQL
>
> But I use ManifoldCF 2.11, Is it ok to use PostgreSQL version 9.6.15  Or I
> should downgrade ?. Any concerns in performance if I use PostgreSQL 9.6.15
> or higher ?
>
> --
> Regards
> -Sreejith
>


Requests for help from Priya Arora

2019-12-30 Thread Karl Wright
Hi all,

Priya has been sending me a ton of requests for help to my personal email,
and I have requested that he/she stop doing that.  I've repeatedly
requested that he/she fix the out-of-memory condition he/she is seeing on
all the ManifoldCF processes on the container setup that is being used, but
he/she has not attempted to correct this.

I've concluded that this person is essentially unfamiliar with the most
basic Java ideas and thus the advice we give is unlikely to be of further
help.  I've sent mail to them requesting that they obtain basic Java
instruction from another source than me or the ManifoldCF lists.

Thanks,
Karl


Tackling JDK 11+

2019-12-26 Thread Karl Wright
Hi folks,

Now that 2.15 is out, it's time to think what to do about JDK 11 and after.

The transition from JDK 8 to JDK 11 will require significant work and
testing, because between JDK 11 *removed* many JDK classes that used to
exist in JDK 8.  The classes can be reincluded as specific dependencies BUT
in order to know what inclusions are needed we are going to need to test
every connector on JDK 11.  In addition, since ManifoldCF distribution
includes a complete execution environment, we will be changing the jars we
include in the binary and lib distributions considerably.

One option is to simply include all components and classes that are no
longer part of JDK 11 but were part of JDK 8, if such a list exists.  This
would be the safest way to proceed, but I have no idea how long this list
of jars is, and what versions of all the component jars we'd need.  It
would be great to use somebody else's work here if it exists.  Does anyone
know a full list of jars and versions that would "convert" a JDK 11 to a
full JDK 8-compatible environment?

Thanks in advance,
Karl


[RESULT] [VOTE] Release Apache ManifoldCF 2.15, RC0

2019-12-25 Thread Karl Wright
Three +1's, >72 hrs.  Vote passes!

Karl

On Sun, Dec 22, 2019 at 7:13 PM Karl Wright  wrote:

> Please vote on whether to release Apache ManifoldCF 2.15, RC0.  The
> release artifact can be found at:
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.15
> .  There is also a release tag at:
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.15-RC0 .
>
> This release of ManifoldCF ties up some loose ends in the csws connector,
> and also updates the Solr Connector to the 8.x version of SolrJ.  There are
> also some bug fixes included.
>
> Thanks,
> Karl
>
>


Re: [VOTE] Release Apache ManifoldCF 2.15, RC0

2019-12-24 Thread Karl Wright
Ran all tests.

+1 from me.

Karl


On Sun, Dec 22, 2019 at 7:13 PM Karl Wright  wrote:

> Please vote on whether to release Apache ManifoldCF 2.15, RC0.  The
> release artifact can be found at:
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.15
> .  There is also a release tag at:
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.15-RC0 .
>
> This release of ManifoldCF ties up some loose ends in the csws connector,
> and also updates the Solr Connector to the 8.x version of SolrJ.  There are
> also some bug fixes included.
>
> Thanks,
> Karl
>
>


[VOTE] Release Apache ManifoldCF 2.15, RC0

2019-12-22 Thread Karl Wright
Please vote on whether to release Apache ManifoldCF 2.15, RC0.  The release
artifact can be found at:
https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.15 .
There is also a release tag at:
https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.15-RC0 .

This release of ManifoldCF ties up some loose ends in the csws connector,
and also updates the Solr Connector to the 8.x version of SolrJ.  There are
also some bug fixes included.

Thanks,
Karl


[jira] [Resolved] (CONNECTORS-1630) Livelink/Opentext connector support REST API

2019-12-22 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1630.
-
Fix Version/s: ManifoldCF 2.14
   Resolution: Fixed

The "csws" connector is the OpenText REST connector.  It shipped with 2.14.


> Livelink/Opentext connector support REST API
> 
>
> Key: CONNECTORS-1630
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1630
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: LiveLink connector
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
>
> Currently, the Livelink connector is based on the Opentext proprietary APIs 
> lapi.jar/lssl.jar
> It seems that Opentext/Livelink focuses most of their efforts on the public 
> REST API and lapi.jar becomes deprecated. Hence, a new connector shoule be 
> developed to leverage the REST API.
> This task needs to investigate the minimum REST API version needed to provide 
> the Manifold functionality (Repository/Authority connection) similar to the 
> proprietary APIs.
> One needs then also to identify the configuration options in the UI, such as
> authority connection
>  * API base Url
>  * username/password auth (it is not basic auth), NTLM, Kerberos
>  * 
> repository:
>  * API base url
>  * API version to use (currently v1 or v2, just in case both version would 
> provide the needed functionality)
>  * username/password auth (it is not basic auth), NTLM, Kerberos
>  * path to fetch (e.g. by object id of the folder)
>  * recursive fetch (yes/no)
>  * regex pattern for specific filenames
>  * regex pattern for specific (sub-)folders in case of recursive fetch
>  * mapping of username to Livelink username
>  * number of threads for API calls
> Then a plan needs to be developed on how to design the functionality. 
> Multi-threading should be used as much as possible, but should be limited to 
> a certain number of threads, e.g. by using a Execution Service,  as the REST 
> API requires many calls to get all information (e.g. to get document 
> categories one needs to "work recursively its way up").
>  
> References:
>  * OpenText REST APIs Content server: 
> [https://developer.opentext.com/webaccess/#url=%2Fawd%2Fresources%2Fapis%2Fcs-rest-api-for-cs-16-s=501]
>  * OpenText REST API Directory services (this MIGHT be needed for the 
> Authority plugin, but it MAY also be fine just with the content server APIs): 
> [https://developer.opentext.com/webaccess/#url=%2Fawd%2Fresources%2Fapis%2Fotds-16=501]
>  * Executor service fixed thread pool: 
> [https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1630) Livelink/Opentext connector support REST API

2019-12-22 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1630:
---

Assignee: Karl Wright

> Livelink/Opentext connector support REST API
> 
>
> Key: CONNECTORS-1630
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1630
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: LiveLink connector
>Reporter: Jörn Franke
>    Assignee: Karl Wright
>Priority: Major
>
> Currently, the Livelink connector is based on the Opentext proprietary APIs 
> lapi.jar/lssl.jar
> It seems that Opentext/Livelink focuses most of their efforts on the public 
> REST API and lapi.jar becomes deprecated. Hence, a new connector shoule be 
> developed to leverage the REST API.
> This task needs to investigate the minimum REST API version needed to provide 
> the Manifold functionality (Repository/Authority connection) similar to the 
> proprietary APIs.
> One needs then also to identify the configuration options in the UI, such as
> authority connection
>  * API base Url
>  * username/password auth (it is not basic auth), NTLM, Kerberos
>  * 
> repository:
>  * API base url
>  * API version to use (currently v1 or v2, just in case both version would 
> provide the needed functionality)
>  * username/password auth (it is not basic auth), NTLM, Kerberos
>  * path to fetch (e.g. by object id of the folder)
>  * recursive fetch (yes/no)
>  * regex pattern for specific filenames
>  * regex pattern for specific (sub-)folders in case of recursive fetch
>  * mapping of username to Livelink username
>  * number of threads for API calls
> Then a plan needs to be developed on how to design the functionality. 
> Multi-threading should be used as much as possible, but should be limited to 
> a certain number of threads, e.g. by using a Execution Service,  as the REST 
> API requires many calls to get all information (e.g. to get document 
> categories one needs to "work recursively its way up").
>  
> References:
>  * OpenText REST APIs Content server: 
> [https://developer.opentext.com/webaccess/#url=%2Fawd%2Fresources%2Fapis%2Fcs-rest-api-for-cs-16-s=501]
>  * OpenText REST API Directory services (this MIGHT be needed for the 
> Authority plugin, but it MAY also be fine just with the content server APIs): 
> [https://developer.opentext.com/webaccess/#url=%2Fawd%2Fresources%2Fapis%2Fotds-16=501]
>  * Executor service fixed thread pool: 
> [https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT] [VOTE] Release Apache ManifoldCF Plugin for Solr 8.x, version 2.2 RC0

2019-12-21 Thread Karl Wright
Four +1's, > 72 hours.  Vote passes!

Karl

On Fri, Dec 20, 2019 at 5:17 AM Markus Schuch  wrote:

> +1 from me
>
> installed the plugin in a fresh solr 8.3.1 instance an successfully ran
> a small smoke test.
>
> Cheers,
> Markus
>
> Am 18.12.2019 um 08:39 schrieb Karl Wright:
> > Please vote on whether to release the initial version of the Apache
> > ManifoldCF Plugin for Solr 8.x, version 2.2, RC0.
> >
> > The release artifact can be found here:
> >
> >
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-solr-8.x-plugin-2.2
> >
> > There is also a tag at:
> >
> https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/tags/release-2.2-RC0
> > .
> >
> > Note that the version "2.2" describes, in part, compatibility with other
> > Solr plugins, so is appropriate in this case for an initial version.
> >
> > Thanks,
> > Karl
> >
>


Re: sharepoint crawler documents limit

2019-12-20 Thread Karl Wright
Hi Priya,

This has nothing to do with anything in ManifoldCF.

Karl


On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:

> Hi All,
>
> Is this issue something to have with below value/parameters set in
> properties.xml.
> [image: image.png]
>
>
> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
> wrote:
>
>> And what other sharepoint parameter I could check?
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
>> escribió:
>>
>>> The code seems correct and many people are using it without encountering
>>> this problem.  There may be another SharePoint configuration parameter you
>>> also need to look at somewhere.
>>>
>>> Karl
>>>
>>>
>>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
>>> wrote:
>>>
>>>>
>>>> Hi Karl,
>>>> On sharepoint the list view threshold is 150,000 but we only receipt
>>>> 20,000 from mcf
>>>> [image: image.png]
>>>>
>>>>
>>>> Jorge Alonso Garcia
>>>>
>>>>
>>>>
>>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>>>> escribió:
>>>>
>>>>> If the job finished without error it implies that the number of
>>>>> documents returned from this one library was 1 when the service is
>>>>> called the first time (starting at doc 0), 1 when it's called the
>>>>> second time (starting at doc 1), and zero when it is called the third
>>>>> time (starting at doc 2).
>>>>>
>>>>> The plugin code is unremarkable and actually gets results in chunks of
>>>>> 1000 under the covers:
>>>>>
>>>>> >>>>>>
>>>>> SPQuery listQuery = new SPQuery();
>>>>> listQuery.Query = ">>>> Override=\"TRUE\">";
>>>>> listQuery.QueryThrottleMode =
>>>>> SPQueryThrottleOption.Override;
>>>>> listQuery.ViewAttributes =
>>>>> "Scope=\"Recursive\"";
>>>>> listQuery.ViewFields = ">>>> Name='FileRef' />";
>>>>> listQuery.RowLimit = 1000;
>>>>>
>>>>> XmlDocument doc = new XmlDocument();
>>>>> retVal = doc.CreateElement("GetListItems",
>>>>> "
>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>>>>> XmlNode getListItemsNode =
>>>>> doc.CreateElement("GetListItemsResponse");
>>>>>
>>>>> uint counter = 0;
>>>>> do
>>>>> {
>>>>> if (counter >= startRowParam +
>>>>> rowLimitParam)
>>>>> break;
>>>>>
>>>>> SPListItemCollection collListItems =
>>>>> oList.GetItems(listQuery);
>>>>>
>>>>>
>>>>> foreach (SPListItem oListItem in
>>>>> collListItems)
>>>>> {
>>>>> if (counter >= startRowParam &&
>>>>> counter < startRowParam + rowLimitParam)
>>>>> {
>>>>> XmlNode resultNode =
>>>>> doc.CreateElement("GetListItemsResult");
>>>>> XmlAttribute idAttribute =
>>>>> doc.CreateAttribute("FileRef");
>>>>> idAttribute.Value = oListItem.Url;
>>>>>
>>>>> resultNode.Attributes.Append(idAttribute);
>>>>> XmlAttribute urlAttribute =
>>>>> doc.CreateAttribute("ListItemURL");
>>>>> //urlAttribute.Value =
>>>>> oListItem.ParentList.DefaultViewUrl;
>>>>>         urlAttribute.Value =
>>>>> string.Format("{0}?ID={1}",
>&g

Re: What do you think about moving to git?

2019-12-20 Thread Karl Wright
Hi Markus,

All of our release scripts and documentation scripts are written to work
against Subversion.  Changing these represents a non-trivial amount of work
- something I don't have time for at the moment. FWIW, our reliance on
Forrest means that our build process for the site pages performs an svn
checkout of the Forrest codebase (so it can be properly patched to support
CJK fonts), so there's no removing Subversion entirely from our
infrastructure at this time in any case.

Karl


I have no fundamental objection to going to git but this cannot be a
one-person task.

On Fri, Dec 20, 2019 at 5:44 AM Markus Schuch  wrote:

> Hi everyone,
>
> ManifoldCF is my only project I'm working on, which is still hosted in
> subversion.
>
> I would like to start a discussion about whether we could migrate to Git.
>
> Why? I see wasted potential for a more active community and more
> contributions by hiding in Subversion. Github has a lot to offer and
> developers may no longer know how to use Subversion these days.
>
> What do you think?
>
> Cheers,
> Markus
>


Re: sharepoint crawler documents limit

2019-12-20 Thread Karl Wright
The code seems correct and many people are using it without encountering
this problem.  There may be another SharePoint configuration parameter you
also need to look at somewhere.

Karl


On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
wrote:

>
> Hi Karl,
> On sharepoint the list view threshold is 150,000 but we only receipt
> 20,000 from mcf
> [image: image.png]
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
> escribió:
>
>> If the job finished without error it implies that the number of documents
>> returned from this one library was 1 when the service is called the
>> first time (starting at doc 0), 1 when it's called the second time
>> (starting at doc 1), and zero when it is called the third time
>> (starting at doc 2).
>>
>> The plugin code is unremarkable and actually gets results in chunks of
>> 1000 under the covers:
>>
>> >>>>>>
>> SPQuery listQuery = new SPQuery();
>> listQuery.Query = "> Override=\"TRUE\">";
>> listQuery.QueryThrottleMode =
>> SPQueryThrottleOption.Override;
>> listQuery.ViewAttributes = "Scope=\"Recursive\"";
>> listQuery.ViewFields = "> />";
>> listQuery.RowLimit = 1000;
>>
>> XmlDocument doc = new XmlDocument();
>> retVal = doc.CreateElement("GetListItems",
>> "
>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>> XmlNode getListItemsNode =
>> doc.CreateElement("GetListItemsResponse");
>>
>> uint counter = 0;
>> do
>> {
>> if (counter >= startRowParam + rowLimitParam)
>> break;
>>
>> SPListItemCollection collListItems =
>> oList.GetItems(listQuery);
>>
>>
>> foreach (SPListItem oListItem in
>> collListItems)
>> {
>> if (counter >= startRowParam && counter <
>> startRowParam + rowLimitParam)
>> {
>> XmlNode resultNode =
>> doc.CreateElement("GetListItemsResult");
>> XmlAttribute idAttribute =
>> doc.CreateAttribute("FileRef");
>> idAttribute.Value = oListItem.Url;
>>
>> resultNode.Attributes.Append(idAttribute);
>> XmlAttribute urlAttribute =
>> doc.CreateAttribute("ListItemURL");
>> //urlAttribute.Value =
>> oListItem.ParentList.DefaultViewUrl;
>> urlAttribute.Value =
>> string.Format("{0}?ID={1}",
>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>> oListItem.ID);
>>
>> resultNode.Attributes.Append(urlAttribute);
>>
>> getListItemsNode.AppendChild(resultNode);
>> }
>> counter++;
>> }
>>
>> listQuery.ListItemCollectionPosition =
>> collListItems.ListItemCollectionPosition;
>>
>> } while (listQuery.ListItemCollectionPosition !=
>> null);
>>
>> retVal.AppendChild(getListItemsNode);
>> <<<<<<
>>
>> The code is clearly working if you get 2 results returned, so I
>> submit that perhaps there's a configured limit in your SharePoint instance
>> that prevents listing more than 2.  That's the only way I can explain
>> this.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi,
>>> The job finnish ok (several times) but always with this 2 documents,
>>> for some reason the loop only execute twice
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
>>> escribió:
>>>
>>>> If the are all in one document, then you'd be running this code:
>>>>
>>

Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
If the are all in one document, then you'd be running this code:

>>>>>>
int startingIndex = 0;
int amtToRequest = 1;
while (true)
{

com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
itemsResult =

itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));

  MessageElement[] itemsList = itemsResult.get_any();

  if (Logging.connectors.isDebugEnabled()){
Logging.connectors.debug("SharePoint: getChildren xml response:
" + itemsList[0].toString());
  }

  if (itemsList.length != 1)
throw new ManifoldCFException("Bad response - expecting one
outer 'GetListItems' node, saw "+Integer.toString(itemsList.length));

  MessageElement items = itemsList[0];
  if (!items.getElementName().getLocalName().equals("GetListItems"))
throw new ManifoldCFException("Bad response - outer node should
have been 'GetListItems' node");

  int resultCount = 0;
  Iterator iter = items.getChildElements();
  while (iter.hasNext())
  {
MessageElement child = (MessageElement)iter.next();
if
(child.getElementName().getLocalName().equals("GetListItemsResponse"))
{
  Iterator resultIter = child.getChildElements();
  while (resultIter.hasNext())
  {
MessageElement result = (MessageElement)resultIter.next();
if
(result.getElementName().getLocalName().equals("GetListItemsResult"))
{
  resultCount++;
  String relPath = result.getAttribute("FileRef");
  String displayURL = result.getAttribute("ListItemURL");
  fileStream.addFile( relPath, displayURL );
}
  }

}
  }

  if (resultCount < amtToRequest)
break;

  startingIndex += resultCount;
}
<<<<<<

What this does is request library content URLs in chunks of 1.  It
stops when it receives less than 1 documents from any one request.

If the documents were all in one library, then one call to the web service
yielded 1 documents, and the second call yielded 1 documents, and
there was no third call for no reason I can figure out.  Since 1
documents were returned each time the loop ought to just continue, unless
there was some kind of error.  Does the job succeed, or does it abort?

Karl


On Thu, Dec 19, 2019 at 12:05 PM Karl Wright  wrote:

> If you are using the MCF plugin, and selecting the appropriate version of
> Sharepoint in the connection configuration, there is no hard limit I'm
> aware of for any Sharepoint job.  We have lots of other people using
> SharePoint and nobody has reported this ever before.
>
> If your SharePoint connection says "SharePoint 2003" as the SharePoint
> version, then sure, that would be expected behavior.  So please check that
> first.
>
> The other question I have is your description of you first getting 10001
> documents and then later 20002.  That's not how ManifoldCF works.  At the
> start of the crawl, seeds are added; this would start out just being the
> root, and then other documents would be discovered as the crawl proceeded,
> after subsites and libraries are discovered.  So I am still trying to
> square that with your description of how this is working for you.
>
> Are all of your documents in one library?  Or two libraries?
>
> Karl
>
>
>
>
> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia 
> wrote:
>
>> Hi,
>> On UI shows 20,002 documents (on a firts phase show 10,001,and after
>> sometime of process raise to 20,002) .
>> It looks like a hard limit, there is more files on sharepoint with the
>> used criteria
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
>> escribió:
>>
>>> Hi Jorge,
>>>
>>> When you run the job, do you see more than 20,000 documents as part of
>>> it?
>>>
>>> Do you see *exactly* 20,000 documents as part of it?
>>>
>>> Unless you are seeing a hard number like that in the UI for that job on
>>> the job status page, I doubt very much that the problem is a numerical
>>> limitation in the number of documents.  I would suspect that the inclusion
>>> criteria, e.g. the mime type or maximum length, is excluding documents.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
>>> wrote:
>>>
>>>> Hi Karl,
>>>> We h

Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
If you are using the MCF plugin, and selecting the appropriate version of
Sharepoint in the connection configuration, there is no hard limit I'm
aware of for any Sharepoint job.  We have lots of other people using
SharePoint and nobody has reported this ever before.

If your SharePoint connection says "SharePoint 2003" as the SharePoint
version, then sure, that would be expected behavior.  So please check that
first.

The other question I have is your description of you first getting 10001
documents and then later 20002.  That's not how ManifoldCF works.  At the
start of the crawl, seeds are added; this would start out just being the
root, and then other documents would be discovered as the crawl proceeded,
after subsites and libraries are discovered.  So I am still trying to
square that with your description of how this is working for you.

Are all of your documents in one library?  Or two libraries?

Karl




On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia 
wrote:

> Hi,
> On UI shows 20,002 documents (on a firts phase show 10,001,and after
> sometime of process raise to 20,002) .
> It looks like a hard limit, there is more files on sharepoint with the
> used criteria
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
> escribió:
>
>> Hi Jorge,
>>
>> When you run the job, do you see more than 20,000 documents as part of it?
>>
>> Do you see *exactly* 20,000 documents as part of it?
>>
>> Unless you are seeing a hard number like that in the UI for that job on
>> the job status page, I doubt very much that the problem is a numerical
>> limitation in the number of documents.  I would suspect that the inclusion
>> criteria, e.g. the mime type or maximum length, is excluding documents.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi Karl,
>>> We had installed the shaterpoint plugin, and access properly
>>> http:/server/_vti_bin/MCPermissions.asmx
>>>
>>> [image: image.png]
>>>
>>> Sharepoint has more than 20,000 documents, but when execute the jon only
>>> extract these 20,000. How Can I check where is the issue?
>>>
>>> Regards
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
>>> escribió:
>>>
>>>> By "stop at 20,000" do you mean that it finds more than 20,000 but
>>>> stops crawling at that time?  Or what exactly do you mean here?
>>>>
>>>> FWIW, the behavior you describe sounds like you may not have installed
>>>> the SharePoint plugin and may have selected a version of SharePoint that is
>>>> inappropriate.  All SharePoint versions after 2008 limit the number of
>>>> documents returned using the standard web services methods.  The plugin
>>>> allows us to bypass that hard limit.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> We have an isuse with sharepoint connector.
>>>>> There is a job that crawl a sharepoint 2016, but it is not recovering
>>>>> all files, it stop at 20.000 documents without any error.
>>>>> Is there any parameter that should be change to avoid this limitation?
>>>>>
>>>>> Regards
>>>>> Jorge Alonso Garcia
>>>>>
>>>>>


Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
Hi Jorge,

When you run the job, do you see more than 20,000 documents as part of it?

Do you see *exactly* 20,000 documents as part of it?

Unless you are seeing a hard number like that in the UI for that job on the
job status page, I doubt very much that the problem is a numerical
limitation in the number of documents.  I would suspect that the inclusion
criteria, e.g. the mime type or maximum length, is excluding documents.

Karl


On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
wrote:

> Hi Karl,
> We had installed the shaterpoint plugin, and access properly http:/server/
> _vti_bin/MCPermissions.asmx
>
> [image: image.png]
>
> Sharepoint has more than 20,000 documents, but when execute the jon only
> extract these 20,000. How Can I check where is the issue?
>
> Regards
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
> escribió:
>
>> By "stop at 20,000" do you mean that it finds more than 20,000 but stops
>> crawling at that time?  Or what exactly do you mean here?
>>
>> FWIW, the behavior you describe sounds like you may not have installed
>> the SharePoint plugin and may have selected a version of SharePoint that is
>> inappropriate.  All SharePoint versions after 2008 limit the number of
>> documents returned using the standard web services methods.  The plugin
>> allows us to bypass that hard limit.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi,
>>> We have an isuse with sharepoint connector.
>>> There is a job that crawl a sharepoint 2016, but it is not recovering
>>> all files, it stop at 20.000 documents without any error.
>>> Is there any parameter that should be change to avoid this limitation?
>>>
>>> Regards
>>> Jorge Alonso Garcia
>>>
>>>


Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
By "stop at 20,000" do you mean that it finds more than 20,000 but stops
crawling at that time?  Or what exactly do you mean here?

FWIW, the behavior you describe sounds like you may not have installed the
SharePoint plugin and may have selected a version of SharePoint that is
inappropriate.  All SharePoint versions after 2008 limit the number of
documents returned using the standard web services methods.  The plugin
allows us to bypass that hard limit.

Karl


On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
wrote:

> Hi,
> We have an isuse with sharepoint connector.
> There is a job that crawl a sharepoint 2016, but it is not recovering all
> files, it stop at 20.000 documents without any error.
> Is there any parameter that should be change to avoid this limitation?
>
> Regards
> Jorge Alonso Garcia
>
>


Re: [VOTE] Release Apache ManifoldCF Plugin for Solr 8.x, version 2.2 RC0

2019-12-19 Thread Karl Wright
Tests all pass.

+1 from me.

Karl


On Wed, Dec 18, 2019 at 2:39 AM Karl Wright  wrote:

> Please vote on whether to release the initial version of the Apache
> ManifoldCF Plugin for Solr 8.x, version 2.2, RC0.
>
> The release artifact can be found here:
>
>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-solr-8.x-plugin-2.2
>
> There is also a tag at:
> https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/tags/release-2.2-RC0
> .
>
> Note that the version "2.2" describes, in part, compatibility with other
> Solr plugins, so is appropriate in this case for an initial version.
>
> Thanks,
> Karl
>
>


[VOTE] Release Apache ManifoldCF Plugin for Solr 8.x, version 2.2 RC0

2019-12-17 Thread Karl Wright
Please vote on whether to release the initial version of the Apache
ManifoldCF Plugin for Solr 8.x, version 2.2, RC0.

The release artifact can be found here:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-solr-8.x-plugin-2.2

There is also a tag at:
https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/tags/release-2.2-RC0
.

Note that the version "2.2" describes, in part, compatibility with other
Solr plugins, so is appropriate in this case for an initial version.

Thanks,
Karl


[jira] [Commented] (CONNECTORS-1586) Create plugin for Solr 8.0.0 when available

2019-12-17 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998823#comment-16998823
 ] 

Karl Wright commented on CONNECTORS-1586:
-

There's already a plugin release created, and I'd like to get it released 
before end of year.  It also has to be compatible back to 8.0.0.  See:  
https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/trunk


> Create plugin for Solr 8.0.0 when available
> ---
>
> Key: CONNECTORS-1586
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1586
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Shinichiro Abe
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: CONNECTORS-1568.patch
>
>
> The plugin for Solr 8.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1586) Create plugin for Solr 8.0.0 when available

2019-12-17 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1586:
---

Assignee: Karl Wright

> Create plugin for Solr 8.0.0 when available
> ---
>
> Key: CONNECTORS-1586
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1586
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Shinichiro Abe
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: CONNECTORS-1568.patch
>
>
> The plugin for Solr 8.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-17 Thread Karl Wright
Found the problem: needed to update a pom dependency.
Everything passes now.

Karl


On Tue, Dec 17, 2019 at 8:07 PM Karl Wright  wrote:

> I just created a plugin directory at
> https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/trunk .
> Code committed there builds but it doesn't test properly because of the
> following exception:
>
> >>>>>>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
> on project apache-manifoldcf-solr-8.x-plugin: Execution default-test of
> goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test failed:
> There was an error in the forked process
> [ERROR] org.apache.maven.surefire.util.SurefireReflectionException:
> java.lang.ClassNotFoundException:
> org.apache.maven.surefire.junit4.JUnit4Provider
> [ERROR] at
> org.apache.maven.surefire.util.ReflectionUtils.loadClass(ReflectionUtils.java:252)
> [ERROR] at
> org.apache.maven.surefire.util.ReflectionUtils.instantiateOneArg(ReflectionUtils.java:128)
> [ERROR] at
> org.apache.maven.surefire.booter.ForkedBooter.createProviderInCurrentClassloader(ForkedBooter.java:230)
> [ERROR] at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:199)
> [ERROR] at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
> [ERROR] at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> [ERROR] Caused by: java.lang.ClassNotFoundException:
> org.apache.maven.surefire.junit4.JUnit4Provider
> [ERROR] at
> java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> [ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [ERROR] at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> [ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [ERROR] at
> org.apache.maven.surefire.util.ReflectionUtils.loadClass(ReflectionUtils.java:244)
> [ERROR] ... 5 more
> [ERROR]
> <<<<<<
>
> This is odd because the plugin depends on solr and solr has a transitive
> maven dependency on junit.  I'll see if a direct dependency works...
>
> Karl
>
>
> On Tue, Dec 17, 2019 at 3:30 PM Jörn Franke  wrote:
>
>> Here you find it: https://issues.apache.org/jira/browse/CONNECTORS-1629
>> I will try it out this year I hope.
>> I will try it though with Solr 8.3.1 and will take into account
>> https://issues.apache.org/jira/browse/CONNECTORS-1586
>>
>> On Tue, Dec 17, 2019 at 1:09 PM Karl Wright  wrote:
>>
>>> Please do!
>>> Karl
>>>
>>>
>>> On Tue, Dec 17, 2019 at 7:06 AM Jörn Franke 
>>> wrote:
>>>
>>>> Thanks a lot Karl for your feedback. Do you mind if I create a Jira
>>>> where I report on the progress?
>>>>
>>>> Am 17.12.2019 um 12:22 schrieb Karl Wright :
>>>>
>>>> 
>>>> Well, you can certainly attempt this simply enough then if you build
>>>> from source.  I'd prefer that you validate the approach before we make
>>>> permanent commits.
>>>>
>>>> Please let me know what works and what doesn't.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Dec 17, 2019 at 1:22 AM Jörn Franke 
>>>> wrote:
>>>>
>>>>> I agree.
>>>>> The delegation part is not relevant for me. I also do not believe it
>>>>> makes sense at the ETL level.
>>>>>  I think still we need add the one line of code that allows to use
>>>>> Kerberos (second line in the example).
>>>>>
>>>>> Am 17.12.2019 um 01:35 schrieb Karl Wright :
>>>>>
>>>>> 
>>>>> Hi Jorn,
>>>>>
>>>>> The code referenced cannot be set up differently from connection to
>>>>> connection so there is no point in having this be anything other than
>>>>> global.  In that case you can point at the config file with
>>>>> -D=value and it will do the same thing as setting a system
>>>>> property.
>>>>>
>>>>> The token delegation with HttpClient I'll have to study to confirm
>>>>> that we're doing this right in the connector.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 16, 2019 at 6:15 PM Jörn Franke 
>>>>> wrote:
>>>>>
>>>>>> Thanks a 

Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-17 Thread Karl Wright
I just created a plugin directory at
https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/trunk .
Code committed there builds but it doesn't test properly because of the
following exception:

>>>>>>
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
on project apache-manifoldcf-solr-8.x-plugin: Execution default-test of
goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test failed:
There was an error in the forked process
[ERROR] org.apache.maven.surefire.util.SurefireReflectionException:
java.lang.ClassNotFoundException:
org.apache.maven.surefire.junit4.JUnit4Provider
[ERROR] at
org.apache.maven.surefire.util.ReflectionUtils.loadClass(ReflectionUtils.java:252)
[ERROR] at
org.apache.maven.surefire.util.ReflectionUtils.instantiateOneArg(ReflectionUtils.java:128)
[ERROR] at
org.apache.maven.surefire.booter.ForkedBooter.createProviderInCurrentClassloader(ForkedBooter.java:230)
[ERROR] at
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:199)
[ERROR] at
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
[ERROR] at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
[ERROR] Caused by: java.lang.ClassNotFoundException:
org.apache.maven.surefire.junit4.JUnit4Provider
[ERROR] at
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[ERROR] at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
[ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[ERROR] at
org.apache.maven.surefire.util.ReflectionUtils.loadClass(ReflectionUtils.java:244)
[ERROR] ... 5 more
[ERROR]
<<<<<<

This is odd because the plugin depends on solr and solr has a transitive
maven dependency on junit.  I'll see if a direct dependency works...

Karl


On Tue, Dec 17, 2019 at 3:30 PM Jörn Franke  wrote:

> Here you find it: https://issues.apache.org/jira/browse/CONNECTORS-1629
> I will try it out this year I hope.
> I will try it though with Solr 8.3.1 and will take into account
> https://issues.apache.org/jira/browse/CONNECTORS-1586
>
> On Tue, Dec 17, 2019 at 1:09 PM Karl Wright  wrote:
>
>> Please do!
>> Karl
>>
>>
>> On Tue, Dec 17, 2019 at 7:06 AM Jörn Franke  wrote:
>>
>>> Thanks a lot Karl for your feedback. Do you mind if I create a Jira
>>> where I report on the progress?
>>>
>>> Am 17.12.2019 um 12:22 schrieb Karl Wright :
>>>
>>> 
>>> Well, you can certainly attempt this simply enough then if you build
>>> from source.  I'd prefer that you validate the approach before we make
>>> permanent commits.
>>>
>>> Please let me know what works and what doesn't.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Dec 17, 2019 at 1:22 AM Jörn Franke 
>>> wrote:
>>>
>>>> I agree.
>>>> The delegation part is not relevant for me. I also do not believe it
>>>> makes sense at the ETL level.
>>>>  I think still we need add the one line of code that allows to use
>>>> Kerberos (second line in the example).
>>>>
>>>> Am 17.12.2019 um 01:35 schrieb Karl Wright :
>>>>
>>>> 
>>>> Hi Jorn,
>>>>
>>>> The code referenced cannot be set up differently from connection to
>>>> connection so there is no point in having this be anything other than
>>>> global.  In that case you can point at the config file with
>>>> -D=value and it will do the same thing as setting a system
>>>> property.
>>>>
>>>> The token delegation with HttpClient I'll have to study to confirm that
>>>> we're doing this right in the connector.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2019 at 6:15 PM Jörn Franke 
>>>> wrote:
>>>>
>>>>> Thanks a lot for the quick reply. Actually it is here:
>>>>> https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr
>>>>> It is also available in the previous versions of Solr.
>>>>> I wonder how easy it would be to add a configuration to the Manifold
>>>>> UI to point to a jaas-client.conf. However, it is also not strictly
>>>>> necessary that this one is configurable in the UI. It could be also a
>>>>> checkbox yes/no Kerberos authentication and the jaas-client.conf could be
>

Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-17 Thread Karl Wright
Please do!
Karl


On Tue, Dec 17, 2019 at 7:06 AM Jörn Franke  wrote:

> Thanks a lot Karl for your feedback. Do you mind if I create a Jira where
> I report on the progress?
>
> Am 17.12.2019 um 12:22 schrieb Karl Wright :
>
> 
> Well, you can certainly attempt this simply enough then if you build from
> source.  I'd prefer that you validate the approach before we make permanent
> commits.
>
> Please let me know what works and what doesn't.
>
> Karl
>
>
> On Tue, Dec 17, 2019 at 1:22 AM Jörn Franke  wrote:
>
>> I agree.
>> The delegation part is not relevant for me. I also do not believe it
>> makes sense at the ETL level.
>>  I think still we need add the one line of code that allows to use
>> Kerberos (second line in the example).
>>
>> Am 17.12.2019 um 01:35 schrieb Karl Wright :
>>
>> 
>> Hi Jorn,
>>
>> The code referenced cannot be set up differently from connection to
>> connection so there is no point in having this be anything other than
>> global.  In that case you can point at the config file with
>> -D=value and it will do the same thing as setting a system
>> property.
>>
>> The token delegation with HttpClient I'll have to study to confirm that
>> we're doing this right in the connector.
>>
>> Karl
>>
>>
>>
>> On Mon, Dec 16, 2019 at 6:15 PM Jörn Franke  wrote:
>>
>>> Thanks a lot for the quick reply. Actually it is here:
>>> https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr
>>> It is also available in the previous versions of Solr.
>>> I wonder how easy it would be to add a configuration to the Manifold UI
>>> to point to a jaas-client.conf. However, it is also not strictly necessary
>>> that this one is configurable in the UI. It could be also a checkbox yes/no
>>> Kerberos authentication and the jaas-client.conf could be put in a certain
>>> folder.
>>>
>>> Interesting would be also if Solr 8.x can be made work in this setting.
>>>
>>> On Mon, Dec 16, 2019 at 11:47 PM Karl Wright  wrote:
>>>
>>>> The Solr Output Connector uses a patched HttpComponents/HttpClient for
>>>> communication with the various Solr Cloud replicas, along with custom
>>>> versions of some of the SolrJ classes which allow multipart posts to work.
>>>> Other than that it's standard SolrJ.  Whatever SolrJ needs to work with
>>>> Kerberos, therefore, should work with the ManifoldCF Solr Output
>>>> Connector.  So if you can point me at the SolrJ documentation for this
>>>> configuration I can perhaps review it and give you my opinion as to the
>>>> difficulty involved.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Dec 16, 2019 at 5:31 PM Jörn Franke 
>>>> wrote:
>>>>
>>>>> Hallo,
>>>>>
>>>>> does the Solr Output Connector support SolrCloud with Kerberos
>>>>> authentication and Zookeeper with Kerberos authentication?
>>>>>
>>>>> If so, how can this be configured?
>>>>>
>>>>> If it is not supported, is there an "easy" way to integrate this? From
>>>>> a development perspective the Kerberos Authentication with both is not
>>>>> difficult to achieve, but of course it stil needs to be integrated in the
>>>>> whole solution.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Best regards
>>>>>
>>>>


Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-17 Thread Karl Wright
Well, you can certainly attempt this simply enough then if you build from
source.  I'd prefer that you validate the approach before we make permanent
commits.

Please let me know what works and what doesn't.

Karl


On Tue, Dec 17, 2019 at 1:22 AM Jörn Franke  wrote:

> I agree.
> The delegation part is not relevant for me. I also do not believe it makes
> sense at the ETL level.
>  I think still we need add the one line of code that allows to use
> Kerberos (second line in the example).
>
> Am 17.12.2019 um 01:35 schrieb Karl Wright :
>
> 
> Hi Jorn,
>
> The code referenced cannot be set up differently from connection to
> connection so there is no point in having this be anything other than
> global.  In that case you can point at the config file with
> -D=value and it will do the same thing as setting a system
> property.
>
> The token delegation with HttpClient I'll have to study to confirm that
> we're doing this right in the connector.
>
> Karl
>
>
>
> On Mon, Dec 16, 2019 at 6:15 PM Jörn Franke  wrote:
>
>> Thanks a lot for the quick reply. Actually it is here:
>> https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr
>> It is also available in the previous versions of Solr.
>> I wonder how easy it would be to add a configuration to the Manifold UI
>> to point to a jaas-client.conf. However, it is also not strictly necessary
>> that this one is configurable in the UI. It could be also a checkbox yes/no
>> Kerberos authentication and the jaas-client.conf could be put in a certain
>> folder.
>>
>> Interesting would be also if Solr 8.x can be made work in this setting.
>>
>> On Mon, Dec 16, 2019 at 11:47 PM Karl Wright  wrote:
>>
>>> The Solr Output Connector uses a patched HttpComponents/HttpClient for
>>> communication with the various Solr Cloud replicas, along with custom
>>> versions of some of the SolrJ classes which allow multipart posts to work.
>>> Other than that it's standard SolrJ.  Whatever SolrJ needs to work with
>>> Kerberos, therefore, should work with the ManifoldCF Solr Output
>>> Connector.  So if you can point me at the SolrJ documentation for this
>>> configuration I can perhaps review it and give you my opinion as to the
>>> difficulty involved.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Dec 16, 2019 at 5:31 PM Jörn Franke 
>>> wrote:
>>>
>>>> Hallo,
>>>>
>>>> does the Solr Output Connector support SolrCloud with Kerberos
>>>> authentication and Zookeeper with Kerberos authentication?
>>>>
>>>> If so, how can this be configured?
>>>>
>>>> If it is not supported, is there an "easy" way to integrate this? From
>>>> a development perspective the Kerberos Authentication with both is not
>>>> difficult to achieve, but of course it stil needs to be integrated in the
>>>> whole solution.
>>>>
>>>> Thank you.
>>>>
>>>> Best regards
>>>>
>>>


Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-16 Thread Karl Wright
Hi Jorn,

The code referenced cannot be set up differently from connection to
connection so there is no point in having this be anything other than
global.  In that case you can point at the config file with
-D=value and it will do the same thing as setting a system
property.

The token delegation with HttpClient I'll have to study to confirm that
we're doing this right in the connector.

Karl



On Mon, Dec 16, 2019 at 6:15 PM Jörn Franke  wrote:

> Thanks a lot for the quick reply. Actually it is here:
> https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr
> It is also available in the previous versions of Solr.
> I wonder how easy it would be to add a configuration to the Manifold UI to
> point to a jaas-client.conf. However, it is also not strictly necessary
> that this one is configurable in the UI. It could be also a checkbox yes/no
> Kerberos authentication and the jaas-client.conf could be put in a certain
> folder.
>
> Interesting would be also if Solr 8.x can be made work in this setting.
>
> On Mon, Dec 16, 2019 at 11:47 PM Karl Wright  wrote:
>
>> The Solr Output Connector uses a patched HttpComponents/HttpClient for
>> communication with the various Solr Cloud replicas, along with custom
>> versions of some of the SolrJ classes which allow multipart posts to work.
>> Other than that it's standard SolrJ.  Whatever SolrJ needs to work with
>> Kerberos, therefore, should work with the ManifoldCF Solr Output
>> Connector.  So if you can point me at the SolrJ documentation for this
>> configuration I can perhaps review it and give you my opinion as to the
>> difficulty involved.
>>
>> Karl
>>
>>
>> On Mon, Dec 16, 2019 at 5:31 PM Jörn Franke  wrote:
>>
>>> Hallo,
>>>
>>> does the Solr Output Connector support SolrCloud with Kerberos
>>> authentication and Zookeeper with Kerberos authentication?
>>>
>>> If so, how can this be configured?
>>>
>>> If it is not supported, is there an "easy" way to integrate this? From a
>>> development perspective the Kerberos Authentication with both is not
>>> difficult to achieve, but of course it stil needs to be integrated in the
>>> whole solution.
>>>
>>> Thank you.
>>>
>>> Best regards
>>>
>>


Re: Solr Output Connector: SolrCloud with Kerberos / Zookeeper with Kerberos

2019-12-16 Thread Karl Wright
The Solr Output Connector uses a patched HttpComponents/HttpClient for
communication with the various Solr Cloud replicas, along with custom
versions of some of the SolrJ classes which allow multipart posts to work.
Other than that it's standard SolrJ.  Whatever SolrJ needs to work with
Kerberos, therefore, should work with the ManifoldCF Solr Output
Connector.  So if you can point me at the SolrJ documentation for this
configuration I can perhaps review it and give you my opinion as to the
difficulty involved.

Karl


On Mon, Dec 16, 2019 at 5:31 PM Jörn Franke  wrote:

> Hallo,
>
> does the Solr Output Connector support SolrCloud with Kerberos
> authentication and Zookeeper with Kerberos authentication?
>
> If so, how can this be configured?
>
> If it is not supported, is there an "easy" way to integrate this? From a
> development perspective the Kerberos Authentication with both is not
> difficult to achieve, but of course it stil needs to be integrated in the
> whole solution.
>
> Thank you.
>
> Best regards
>


Time for another release

2019-12-13 Thread Karl Wright
It's been a relatively quiet four months for a change, but the time has
come again to push out a release.  There are some important bug fixes for
the new Csws connector that really should be in the shipping artifact, for
one thing.

But I'd like to encourage everyone to pull together anything they're
working on and get it into the code base.  I am hoping to spin RC0 sometime
Christmas week.

Thanks in advance,
Karl


Re: About Manifold CF API

2019-11-28 Thread Karl Wright
Hi Kaya,
The best way to form proper JSON is to create a job with the UI and export
its JSON, and use that as a model.
Thanks,
Karl


On Thu, Nov 28, 2019 at 3:05 AM Kayak28  wrote:

> Hello, Community Members:
>
> I have a question about the form of JSON when I call a job-creation API.
> I would like to use the following API.
> jobs POST Create a job {"job":**} {"job_id":**
> } *OR* {"error":**}
>
> The URL I should send with curl POST command is:
> http://localhost:8345/mcf-api-services/json/jobs.
>
> My question is what is the correct format for job_object?
> I have tried the following JSON to create a job, but ended in "400: Bad
> JSON: null"
> The following JSON should crate a job with WEB-repository and Filesystem
> Output.
> If anyone knows what JSON I should send to create a job in MCF, please let
> me know.
> I would really appreciate any of your help.
>
> {"job":
> {"_children_":[
> {"_type_":"id",
> "_value_":"1574926560999"},
> {"_type_":"description",
> "_value_":"test_job1-1"},
> {"_type_":"repository_connection",
> "_value_":"web-repo"},
> { "_type_":"document_specification",
> "_children_":[
> {"_type_":"seeds",
> "_value_":"https:\/\/dummy-web-host-mcf.herokuapp.com\/"},
> {"_type_":"includes",
> "_value_":".*"},
> {"_type_":"includesindex","_value_":".*"},
> {"_type_":"limittoseeds",
> "_value_":"","_attribute_value":"true"},
> {"_type_":"excludes","_value_":""},
> {"_type_":"excludesindex","_value_":""},
> {"_type_":"excludescontentindex","_value_":""}
> ]},
> {"_type_":"pipelinestage",
> "_children_":
> [
> {"_type_":"stage_id",
> "_value_":"0"},
> {"_type_":"stage_isoutput",
> "_value_":"true"},
> {"_type_":"stage_connectionname",
> "_value_":"sample_out"},
> {"_type_":"stage_specification",
> "rootpath":
> {"_attribute_ROOTPATH":"\/home\/vagrant\/out\/",
> "_value_":""}}
> ]},
> {"_type_":"start_mode",
> "_value_":"manual"},
> {"_type_":"run_mode",
> "_value_":"scan once"},
> {"_type_":"hopcount_mode",
> "_value_":"accurate"},
> {"_type_":"priority",
> "_value_":"5"},
> {"_type_":"recrawl_interval","_value_":"8640"},
> {"_type_":"max_recrawl_interval","_value_":"infinite"}
> {"_type_":"expiration_interval","_value_":"infinite"},
> {"_type_":"reseed_interval","_value_":"360"},
> {"_type_":"schedule",
> "_children_":[
> {"_type_":"requestminimum","_value_":"false"},
> {"_type_":"timezone","_value_":"Japan"},
> {"_type_":"dayofweek","value":"0"}
> ]}
> ]}}
>
> Sincerely,
> Kaya Ota
>
>


Re: Continues Job

2019-11-26 Thread Karl Wright
No, just changing the job characteristics will NOT cause the incremental
behavior to be erased.

Karl


On Mon, Nov 25, 2019 at 10:20 PM Sreejith Variyath <
sreejith.variy...@tarams.com> wrote:

> Yes. I understood. Thanks Karl.
>
> I have another question. If I update job type from  TYPE_SPECIFIED  to
> TYPE_CONTINUOUS , Then the document versioning will reset and job will
> pick all the documents again?.
>
> On Tue, Nov 26, 2019, 05:12 Karl Wright  wrote:
>
>> One of the characteristics of continuous jobs is that they call
>> addSeedDocuments multiple times on a single job run.  The job run never
>> ends, so this is how the job picks up documents for the infinitely-running
>> job.  That's just the way it works.  Have you read the book?
>>
>> Karl
>>
>>
>> On Mon, Nov 25, 2019 at 5:37 PM SREEJITH va 
>> wrote:
>>
>>> Hi Every One,
>>>
>>> I am trying to setup a job which is having a JDBC repository connector.
>>> One transformation connector and a custom output connector.
>>>
>>> I want this job needs to run in two mode.
>>>
>>>- Sample Mode : This is a sample migration mode. Job will pick 10
>>>documents and migrate to output repository. Then pause the job. I am
>>>planning to pause the job using quartz job depends on the document
>>>processing and document in queue count. This sample run can do "n" times.
>>>- Actual Mode : This is the actual migration mode. In this mode, The
>>>same job needs to runs continuously. The remaining documents after the
>>>sample migration and also any new documents should pick and migrate.
>>>
>>> Could any one please help me on what schedule settings I should follow
>>> for this kind of job.  Currently I tried with following settings
>>>
>>> jobDescription.setStartMethod(IJobDescription.START_DISABLE);
>>> jobDescription.setType(IJobDescription.TYPE_CONTINUOUS);
>>> jobDescription.setInterval(30l);
>>> jobDescription.setReseedInterval(12l);
>>>
>>> Also I have idNodeQuery set to pick 10 documents  (in sample mode) in
>>> the JDBC repo connector. But during the job startup in sample mode,
>>> The addSeedDocuments(...) API getting invoked twice and it causing to
>>> process more than 10 documents. I do have version query. And the documents
>>> are not processing in subsequent runs unless there is a version change.
>>>
>>> Really appreciate if some one can help me on these two queries.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> -Sreejith
>>>
>>
> www.tarams.com
> =
> DISCLAIMER: The information in this message is confidential and may be
> legally privileged. It is intended solely for the addressee. Access to this
> message by anyone else is unauthorized. If you are not the intended
> recipient, any disclosure, copying, or distribution of the message, or any
> action or omission taken by you in reliance on it, is prohibited and may be
> unlawful. Please immediately contact the sender if you have received this
> message in error. Further, this e-mail may contain viruses and all
> reasonable precaution to minimize the risk arising there from is taken by
> Tarams. Tarams is not liable for any damage sustained by you as a result of
> any virus in this e-mail. All applicable virus checks should be carried out
> by you before opening this e-mail or any attachment thereto.
> Thank you - Tarams Software Technologies Pvt.Ltd.
> =
>


Re: Continues Job

2019-11-25 Thread Karl Wright
One of the characteristics of continuous jobs is that they call
addSeedDocuments multiple times on a single job run.  The job run never
ends, so this is how the job picks up documents for the infinitely-running
job.  That's just the way it works.  Have you read the book?

Karl


On Mon, Nov 25, 2019 at 5:37 PM SREEJITH va  wrote:

> Hi Every One,
>
> I am trying to setup a job which is having a JDBC repository connector.
> One transformation connector and a custom output connector.
>
> I want this job needs to run in two mode.
>
>- Sample Mode : This is a sample migration mode. Job will pick 10
>documents and migrate to output repository. Then pause the job. I am
>planning to pause the job using quartz job depends on the document
>processing and document in queue count. This sample run can do "n" times.
>- Actual Mode : This is the actual migration mode. In this mode, The
>same job needs to runs continuously. The remaining documents after the
>sample migration and also any new documents should pick and migrate.
>
> Could any one please help me on what schedule settings I should follow for
> this kind of job.  Currently I tried with following settings
>
> jobDescription.setStartMethod(IJobDescription.START_DISABLE);
> jobDescription.setType(IJobDescription.TYPE_CONTINUOUS);
> jobDescription.setInterval(30l);
> jobDescription.setReseedInterval(12l);
>
> Also I have idNodeQuery set to pick 10 documents  (in sample mode) in the
> JDBC repo connector. But during the job startup in sample mode,
> The addSeedDocuments(...) API getting invoked twice and it causing to
> process more than 10 documents. I do have version query. And the documents
> are not processing in subsequent runs unless there is a version change.
>
> Really appreciate if some one can help me on these two queries.
>
>
>
>
>
> --
> Regards
> -Sreejith
>


Re: Manifoldcf version conflict

2019-11-19 Thread Karl Wright
I was incorrect.  The value comes from one of the properties:

 

Karl


On Tue, Nov 19, 2019 at 6:16 AM Priya Arora  wrote:

> I am using docker commands to install manifoldcf inside docker container.
> So what I understand is that mcf downloads latest crawler-ui.war files in
> the web folder(that is what i checked in the local system). Do I need to
> check somewhere else.
> [image: image.png]
>
> On Tue, Nov 19, 2019 at 4:40 PM Karl Wright  wrote:
>
>> That version comes directly from the ant build version that was used to
>> compile the UI.  What version of crawler-ui.war do you have?
>>
>> Karl
>>
>>
>> On Tue, Nov 19, 2019 at 5:50 AM Priya Arora  wrote:
>>
>>> Hi All,
>>>
>>>  I have upgraded manifoldcf version on the server to version 2.14, I
>>> re-confirmed it via docker build command that it is downloading 2.14
>>> version only.
>>>
>>> [image: image.png]
>>>
>>> But when I am  starting up manifold wythe version is showing me up 2.10
>>> as shown in the screenshot above. Is there any static value being passed.
>>> Or do I have to manually do something, which i guess so "not",
>>> because on local system its pointing the correct value/ version
>>>
>>> Thanks
>>> Priya
>>>
>>


Re: Manifoldcf version conflict

2019-11-19 Thread Karl Wright
That version comes directly from the ant build version that was used to
compile the UI.  What version of crawler-ui.war do you have?

Karl


On Tue, Nov 19, 2019 at 5:50 AM Priya Arora  wrote:

> Hi All,
>
>  I have upgraded manifoldcf version on the server to version 2.14, I
> re-confirmed it via docker build command that it is downloading 2.14
> version only.
>
> [image: image.png]
>
> But when I am  starting up manifold wythe version is showing me up 2.10 as
> shown in the screenshot above. Is there any static value being passed.
> Or do I have to manually do something, which i guess so "not", because on
> local system its pointing the correct value/ version
>
> Thanks
> Priya
>


[jira] [Commented] (CONNECTORS-1628) Confluence Connector hang on error

2019-11-18 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976686#comment-16976686
 ] 

Karl Wright commented on CONNECTORS-1628:
-

Hi [~julienFL], this looks fine, please go ahead and commit to trunk.


> Confluence Connector hang on error
> --
>
> Key: CONNECTORS-1628
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1628
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.14
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Critical
> Fix For: ManifoldCF 2.15
>
> Attachments: CONNECTORS-1628.diff
>
>
> During a crawling job, if the confluence connector encounters error(s) on 
> requests, it hang and there is no other way than restarting the MCF agent so 
> it works again.
> The reason is that the connector does not release the HTTP response if an 
> exception or an HTTP error is encountered during its processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1628) Confluence Connector hang on error

2019-11-18 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1628:
---

Assignee: Julien Massiera

> Confluence Connector hang on error
> --
>
> Key: CONNECTORS-1628
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1628
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.14
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Critical
> Fix For: ManifoldCF 2.15
>
> Attachments: CONNECTORS-1628.diff
>
>
> During a crawling job, if the confluence connector encounters error(s) on 
> requests, it hang and there is no other way than restarting the MCF agent so 
> it works again.
> The reason is that the connector does not release the HTTP response if an 
> exception or an HTTP error is encountered during its processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Welcome Houston Putman as Lucene/Solr committer

2019-11-14 Thread Karl Wright
Welcome!
Karl

On Thu, Nov 14, 2019 at 8:17 AM Michael Sokolov  wrote:

> Hi Houston, welcome!
>
> On Thu, Nov 14, 2019 at 7:23 AM Erick Erickson 
> wrote:
> >
> > Welcome!
> >
> > > On Nov 14, 2019, at 5:19 AM, Jan Høydahl 
> wrote:
> > >
> > > Congrats and welcome Houston!
> > >
> > > --
> > > Jan Høydahl, search solution architect
> > > Cominvent AS - www.cominvent.com
> > >
> > >> 14. nov. 2019 kl. 09:57 skrev Anshum Gupta :
> > >>
> > >> Hi all,
> > >>
> > >> Please join me in welcoming Houston Putman as the latest Lucene/Solr
> committer!
> > >>
> > >> Houston has been involved with the community since 2013, when he
> first contributed the Analytics contrib module. Since then he has been
> involved with the community, participated in conferences and spoken about
> his work with Lucene/Solr. In the recent past, he has been involved with
> getting Solr to scale on Kubernetes.
> > >>
> > >> Looking forward to your commits to the Apache Lucene/Solr project :)
> > >>
> > >> Congratulations and welcome, Houston! It's a tradition to introduce
> yourself with a brief bio.
> > >>
> > >> --
> > >> Anshum Gupta
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Windows shares connector-Error

2019-11-10 Thread Karl Wright
Can you do the following:

>>>>>>
C:\wip\mcf\trunk>dir lib\less*
 Volume in drive C is Windows
 Volume Serial Number is F4D8-E4E0

 Directory of C:\wip\mcf\trunk\lib

09/06/2019  02:52 PM 1,304,630 less4j-1.17.2.jar
   1 File(s)  1,304,630 bytes
   0 Dir(s)  157,270,810,624 bytes free

C:\wip\mcf\trunk>
<<<<<<

If you do not see the above you have not unpacked the lib distribution in
the right place.

Thanks,
Karl


On Mon, Nov 11, 2019 at 1:35 AM Priya Arora  wrote:

> Hi,
>
> Tried above suggested by using following steps:-
> 1) Downloaded src and lib and bin distribution of manifoldcf 2.14 version.
>
> [image: image.png]
> 2) Extracted and open CMD and run "ant make-deps command" and got
> below error while running.
> [image: image.png]
>  Error:-
> compile-less-compiler:
> [mkdir] Created dir:
> D:\Official\Projects\ManifoldCF\apache-manifoldcf-2.14\framework\build\less-compiler\classes
> [javac] Compiling 1 source file to
> D:\Official\Projects\ManifoldCF\apache-manifoldcf-2.14\framework\build\less-compiler\classes
> [javac]
> D:\Official\Projects\ManifoldCF\apache-manifoldcf-2.14\framework\less-compiler\src\main\java\org\apache\manifoldcf\less\MCFLessCompiler.java:22:
> error: package com.github.sommeri.less4j does not exist
> [javac] import com.github.sommeri.less4j.Less4jException;
> [javac] ^
> [javac]
> D:\Official\Projects\ManifoldCF\apache-manifoldcf-2.14\framework\less-compiler\src\main\java\org\apache\manifoldcf\less\MCFLessCompiler.java:23:
> error: package com.github.sommeri.less4j does not exist
> [javac] import com.github.sommeri.less4j.LessCompiler;
> [javac] ^
>
> Can you please let me know, how can I resolve this error.
>
> Thanks and Regards
> Priya
>
> On Fri, Nov 8, 2019 at 5:38 PM Karl Wright  wrote:
>
>> (1) Download source distribution and lib distribution
>> (2) Unpack and follow directions for placing lib folder in place
>> (3) Run 'ant make-deps' to download the correct version of jcifs
>> (4) Run "ant build" to make a distribution that includes proprietary
>> examples
>> (5) Use the proprietary example you need
>>
>> The reason this might be a good idea is because we no longer use the
>> older versions of jcifs, but a newer one with some fixes instead.
>>
>> Karl
>>
>>
>> On Fri, Nov 8, 2019 at 7:04 AM Priya Arora  wrote:
>>
>>> This didn't work even. Is that(manifoldcf version 2.14) something to do
>>> with java version also. If yes , I am using JAVA_HOME :_ java version 8.
>>> Can you suggest something
>>>
>>> On Fri, Nov 8, 2019 at 4:16 PM Sreejith Variyath <
>>> sreejith.variy...@tarams.com> wrote:
>>>
>>>> place the jcifs.jar into the *connector-lib-proprietary* directory
>>>>
>>>> On Fri, Nov 8, 2019 at 2:38 PM Priya Arora  wrote:
>>>>
>>>>> Hi All
>>>>>
>>>>> I installed the 2.14 version of manifoldcf , then uncommented the line
>>>>> in connectors.xml file ">>>> =" 
>>>>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector
>>>>> "/>" , but when I try to start with(java- jar start.jar) gives error:
>>>>>
>>>>> I also checked it mcf-jcifs-connector.jar is also present in
>>>>> connector-lib.
>>>>>
>>>>> Do i need to do something else also.Here is the error log.
>>>>>
>>>>> Successfully registered repository connector
>>>>> 'org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector'
>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> jcifs/smb/SmbException
>>>>> at java.base/java.lang.Class.forName0(Native Method)
>>>>> at java.base/java.lang.Class.forName(Unknown Source)
>>>>> at
>>>>> org.apache.manifoldcf.core.system.ManifoldCFResourceLoader.findClass(ManifoldCFResourceLoader.java:149)
>>>>> at
>>>>> org.apache.manifoldcf.core.system.ManifoldCF.findClass(ManifoldCF.java:1533)
>>>>> at
>>>>> org.apache.manifoldcf.core.interfaces.ConnectorFactory.getThisConnectorRaw(ConnectorFactory.java:144)
>>>>> at
>>>>> org.apache.manifoldcf.core.interfaces.ConnectorFactory.getThisConnectorNoCheck(ConnectorFactory.java:118)
>>>>> at
>>>>&

Re: Windows shares connector-Error

2019-11-08 Thread Karl Wright
(1) Download source distribution and lib distribution
(2) Unpack and follow directions for placing lib folder in place
(3) Run 'ant make-deps' to download the correct version of jcifs
(4) Run "ant build" to make a distribution that includes proprietary
examples
(5) Use the proprietary example you need

The reason this might be a good idea is because we no longer use the older
versions of jcifs, but a newer one with some fixes instead.

Karl


On Fri, Nov 8, 2019 at 7:04 AM Priya Arora  wrote:

> This didn't work even. Is that(manifoldcf version 2.14) something to do
> with java version also. If yes , I am using JAVA_HOME :_ java version 8.
> Can you suggest something
>
> On Fri, Nov 8, 2019 at 4:16 PM Sreejith Variyath <
> sreejith.variy...@tarams.com> wrote:
>
>> place the jcifs.jar into the *connector-lib-proprietary* directory
>>
>> On Fri, Nov 8, 2019 at 2:38 PM Priya Arora  wrote:
>>
>>> Hi All
>>>
>>> I installed the 2.14 version of manifoldcf , then uncommented the line
>>> in connectors.xml file ">> =" org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector
>>> "/>" , but when I try to start with(java- jar start.jar) gives error:
>>>
>>> I also checked it mcf-jcifs-connector.jar is also present in
>>> connector-lib.
>>>
>>> Do i need to do something else also.Here is the error log.
>>>
>>> Successfully registered repository connector
>>> 'org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector'
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> jcifs/smb/SmbException
>>> at java.base/java.lang.Class.forName0(Native Method)
>>> at java.base/java.lang.Class.forName(Unknown Source)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCFResourceLoader.findClass(ManifoldCFResourceLoader.java:149)
>>> at
>>> org.apache.manifoldcf.core.system.ManifoldCF.findClass(ManifoldCF.java:1533)
>>> at
>>> org.apache.manifoldcf.core.interfaces.ConnectorFactory.getThisConnectorRaw(ConnectorFactory.java:144)
>>> at
>>> org.apache.manifoldcf.core.interfaces.ConnectorFactory.getThisConnectorNoCheck(ConnectorFactory.java:118)
>>> at
>>> org.apache.manifoldcf.core.interfaces.ConnectorFactory.installThis(ConnectorFactory.java:48)
>>> at
>>> org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.install(RepositoryConnectorFactory.java:100)
>>> at
>>> org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:180)
>>> at
>>> org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672)
>>> at
>>> org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
>>> at
>>> org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
>>> Caused by: java.lang.ClassNotFoundException: jcifs.smb.SmbException
>>> at java.base/java.net.URLClassLoader.findClass(Unknown Source)
>>> at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
>>> at java.base/java.net.FactoryURLClassLoader.loadClass(Unknown
>>> Source)
>>> at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
>>> ... 12 more
>>>
>>> Thanks and regards
>>> Priya
>>>
>>
>>
>> --
>> Best Regards,
>>
>>
>> *Sreejith Variyath*
>> Lead Software Engineer
>> Tarams Software Technologies Pvt. Ltd.
>> Venus Buildings, 2nd Floor 1/2,3rd Main,
>> Kalyanamantapa Road Jakasandra, 1st Block Kormangala
>> Bangalore - 560034
>> Tarams 
>>
>>
>> www.tarams.com
>> =
>> DISCLAIMER: The information in this message is confidential and may be
>> legally privileged. It is intended solely for the addressee. Access to this
>> message by anyone else is unauthorized. If you are not the intended
>> recipient, any disclosure, copying, or distribution of the message, or any
>> action or omission taken by you in reliance on it, is prohibited and may be
>> unlawful. Please immediately contact the sender if you have received this
>> message in error. Further, this e-mail may contain viruses and all
>> reasonable precaution to minimize the risk arising there from is taken by
>> Tarams. Tarams is not liable for any damage sustained by you as a result of
>> any virus in this e-mail. All applicable virus checks should be carried out
>> by you before opening this e-mail or any attachment thereto.
>> Thank you - Tarams Software Technologies Pvt.Ltd.
>> =
>>
>


Re: Illegal transaction ID/parent transaction ID

2019-11-07 Thread Karl Wright
Have you tried deploying the combined war on tomcat instead?

I honestly do not know what is wrong but if the combined war works you have
something to compare/contrast against.

Karl


On Thu, Nov 7, 2019 at 2:45 PM SREEJITH va  wrote:

> Thanks Karl, Here is quick summary on how I embedded Manifold in my
> application.
>
>
>- All the required manifold jar dependencies are in pom.
>- The properties.xml is served through
>org.apache.manifoldcf.configfile settings in catalina.properties
>- There is an application ready Lister where I do following things.
>
> IThreadContext tc = ThreadContextFactory.make();
> ManifoldCF.initializeEnvironment(tc);
> ManifoldCF.registerThisAgent(tc);
> ManifoldCF.reregisterAllConnectors(tc);
> AgentsDaemon.startAgents(threadContext)
>
> One thing which I observed is that the "threadContext" which is using for
> API  *AgentsDaemon.startAgents( threadContext )* is different than the
> other initialization APIs. Is this causing this issue?. But I can create
> jobs and its running for while (may be weeks or months) until I am start
> getting this exception. I mean I don't know the pattern which this is
> happening. And I am still trying to understand that overlapping of thread
> that you mentioned in previous mail.
>
>
>
>
> On Thu, Nov 7, 2019 at 10:57 PM Karl Wright  wrote:
>
>> How are you embedding ManifoldCF in your application?
>>
>> What looks like is happening is that thread contexts are being lost
>> somehow.  ManifoldCF uses thread contexts to keep track of worker
>> thread-local information, and it appears that you are calling into
>> ManifoldCF code assuming that (for example) Thread A can close Thread B's
>> transactions.  That doesn't work.
>>
>> Karl
>>
>>
>> On Thu, Nov 7, 2019 at 12:22 PM SREEJITH va 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have an spring based application in which Manifold is embedded and
>>> running in tomcat.  At some point I am getting below exceptions. Any lead
>>> on why this happening would be greatly appreciated.
>>>
>>> One scenario in which I can see this in my logs is while shutting down
>>> the tomcat. And if it happens during the run time,  Any further call to
>>> manifold services will all fail with the same exception.
>>>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Illegal
>>> parent transaction ID: 1573141219180
>>> at
>>> org.apache.manifoldcf.core.cachemanager.CacheManager.startTransaction(CacheManager.java:696)
>>> at
>>> org.apache.manifoldcf.core.database.Database.beginTransaction(Database.java:241)
>>> at
>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.beginTransaction(DBInterfacePostgreSQL.java:1188)
>>> at
>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.beginTransaction(DBInterfacePostgreSQL.java:1158)
>>> at
>>> org.apache.manifoldcf.crawler.jobs.JobManager.manualAbort(JobManager.java:6900)
>>>
>>> Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> Illegal transaction ID: '1573140884596'
>>> at
>>> org.apache.manifoldcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:288)
>>> at
>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:100)
>>> at
>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>> at
>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>> at
>>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
>>> at
>>> org.apache.manifoldcf.agents.transformationconnmgr.TransformationConnectorManager.getConnectors(TransformationConnectorManager.java:253)
>>>
>>> --
>>> Regards
>>> -Sreejith
>>>
>>
>
> --
> Regards
> -Sreejith
>


Re: Illegal transaction ID/parent transaction ID

2019-11-07 Thread Karl Wright
How are you embedding ManifoldCF in your application?

What looks like is happening is that thread contexts are being lost
somehow.  ManifoldCF uses thread contexts to keep track of worker
thread-local information, and it appears that you are calling into
ManifoldCF code assuming that (for example) Thread A can close Thread B's
transactions.  That doesn't work.

Karl


On Thu, Nov 7, 2019 at 12:22 PM SREEJITH va  wrote:

> Hi All,
>
> I have an spring based application in which Manifold is embedded and
> running in tomcat.  At some point I am getting below exceptions. Any lead
> on why this happening would be greatly appreciated.
>
> One scenario in which I can see this in my logs is while shutting down the
> tomcat. And if it happens during the run time,  Any further call to
> manifold services will all fail with the same exception.
>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Illegal parent
> transaction ID: 1573141219180
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.startTransaction(CacheManager.java:696)
> at
> org.apache.manifoldcf.core.database.Database.beginTransaction(Database.java:241)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.beginTransaction(DBInterfacePostgreSQL.java:1188)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.beginTransaction(DBInterfacePostgreSQL.java:1158)
> at
> org.apache.manifoldcf.crawler.jobs.JobManager.manualAbort(JobManager.java:6900)
>
> Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Illegal transaction ID: '1573140884596'
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.enterCache(CacheManager.java:288)
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:100)
> at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
> at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
> at
> org.apache.manifoldcf.agents.transformationconnmgr.TransformationConnectorManager.getConnectors(TransformationConnectorManager.java:253)
>
> --
> Regards
> -Sreejith
>


Re: Manifoldcf - Job Deletion Process

2019-10-30 Thread Karl Wright
Ok, so pick ONE of these identifiers.

What I want to see is the entire lifecycle of the ONE identifier.  That
includes what the Web Connection logs as well as what the indexation logs.
Ideally I'd like to see:

- job start and end
- web connection events
- indexing events

I'd like to see these for both the job that indexes the document initially
as well as the job run that deletes the document.

My suspicion is that on the second run the document is simply no longer
reachable from the seeds.  In other words, the seed documents either cannot
be fetched on the second run or they contain different stuff and there's no
longer a chain of links between the seeds and the documents being deleted.

Thanks,
Karl


On Wed, Oct 30, 2019 at 1:50 AM Priya Arora  wrote:

> Indexation screenshot is as below.
>
> [image: image.png]
>
> On Tue, Oct 29, 2019 at 7:57 PM Karl Wright  wrote:
>
>> I need both ingestion and deletion.
>> Karl
>>
>>
>> On Tue, Oct 29, 2019 at 8:09 AM Priya Arora  wrote:
>>
>>> History is shown as below as it does not indicates any error.
>>> [image: 12.JPG]
>>>
>>> Thanks
>>> Priya
>>>
>>> On Tue, Oct 29, 2019 at 5:02 PM Karl Wright  wrote:
>>>
>>>> What does the history say about these documents?
>>>> Karl
>>>>
>>>> On Tue, Oct 29, 2019 at 6:53 AM Priya Arora 
>>>> wrote:
>>>>
>>>>>
>>>>>  it may be that (a) they weren't found, or (b) that the document
>>>>> specification in the job changed and they are no longer included in the 
>>>>> job.
>>>>>
>>>>> URL's that were deleted are valid URL's(as that does not result in
>>>>> 404 or page not found error), and it is not being mentioned in Exclusion
>>>>> tab of job configuration.
>>>>> And the URL's were getting indexed earlier and except for index name
>>>>> in Elasticsearch nothing is changed in Job specification and in other
>>>>> connectors.
>>>>>
>>>>> Thanks
>>>>> Priya
>>>>>
>>>>> On Tue, Oct 29, 2019 at 3:40 PM Karl Wright 
>>>>> wrote:
>>>>>
>>>>>> ManifoldCF is an incremental crawler, which means that on every
>>>>>> (non-continuous) job run it sees which documents it can find and removes
>>>>>> the ones it can't.  The history for the documents being deleted should 
>>>>>> tell
>>>>>> you why they are being deleted -- it may be that (a) they weren't found, 
>>>>>> or
>>>>>> (b) that the document specification in the job changed and they are no
>>>>>> longer included in the job.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have a query regarding ManifoldCF Job process.I have a job to
>>>>>>> crawl intranet site
>>>>>>> Repository Type:- Web
>>>>>>> Output Connector Type:- Elastic search.
>>>>>>>
>>>>>>> Job have to crawl around4-5 lakhs of total records. I have discarded
>>>>>>> the previous index and created a new index(in Elasticsearch) with proper
>>>>>>> mappings and settings and started the job again after cleaning Database
>>>>>>> even(Database used a PostgreSQL).
>>>>>>> But while the job continues its ingests the records properly but
>>>>>>> just before finishing (some times in between also), it initiates the
>>>>>>> process of Deletions and also it does not index the deleted documents 
>>>>>>> again
>>>>>>> in index.
>>>>>>>
>>>>>>> Can you please something if I am doing anything wrong? or is this a
>>>>>>> process of manifoldcf if yes , why its not getting ingested again.
>>>>>>>
>>>>>>> Thanks and regards
>>>>>>> Priya
>>>>>>>
>>>>>>>


Re: Manifoldcf - Job Deletion Process

2019-10-29 Thread Karl Wright
I need both ingestion and deletion.
Karl


On Tue, Oct 29, 2019 at 8:09 AM Priya Arora  wrote:

> History is shown as below as it does not indicates any error.
> [image: 12.JPG]
>
> Thanks
> Priya
>
> On Tue, Oct 29, 2019 at 5:02 PM Karl Wright  wrote:
>
>> What does the history say about these documents?
>> Karl
>>
>> On Tue, Oct 29, 2019 at 6:53 AM Priya Arora  wrote:
>>
>>>
>>>  it may be that (a) they weren't found, or (b) that the document
>>> specification in the job changed and they are no longer included in the job.
>>>
>>> URL's that were deleted are valid URL's(as that does not result in 404
>>> or page not found error), and it is not being mentioned in Exclusion tab of
>>> job configuration.
>>> And the URL's were getting indexed earlier and except for index name in
>>> Elasticsearch nothing is changed in Job specification and in other
>>> connectors.
>>>
>>> Thanks
>>> Priya
>>>
>>> On Tue, Oct 29, 2019 at 3:40 PM Karl Wright  wrote:
>>>
>>>> ManifoldCF is an incremental crawler, which means that on every
>>>> (non-continuous) job run it sees which documents it can find and removes
>>>> the ones it can't.  The history for the documents being deleted should tell
>>>> you why they are being deleted -- it may be that (a) they weren't found, or
>>>> (b) that the document specification in the job changed and they are no
>>>> longer included in the job.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have a query regarding ManifoldCF Job process.I have a job to crawl
>>>>> intranet site
>>>>> Repository Type:- Web
>>>>> Output Connector Type:- Elastic search.
>>>>>
>>>>> Job have to crawl around4-5 lakhs of total records. I have discarded
>>>>> the previous index and created a new index(in Elasticsearch) with proper
>>>>> mappings and settings and started the job again after cleaning Database
>>>>> even(Database used a PostgreSQL).
>>>>> But while the job continues its ingests the records properly but just
>>>>> before finishing (some times in between also), it initiates the process of
>>>>> Deletions and also it does not index the deleted documents again in index.
>>>>>
>>>>> Can you please something if I am doing anything wrong? or is this a
>>>>> process of manifoldcf if yes , why its not getting ingested again.
>>>>>
>>>>> Thanks and regards
>>>>> Priya
>>>>>
>>>>>


Re: Manifoldcf - Job Deletion Process

2019-10-29 Thread Karl Wright
What does the history say about these documents?
Karl

On Tue, Oct 29, 2019 at 6:53 AM Priya Arora  wrote:

>
>  it may be that (a) they weren't found, or (b) that the document
> specification in the job changed and they are no longer included in the job.
>
> URL's that were deleted are valid URL's(as that does not result in 404 or
> page not found error), and it is not being mentioned in Exclusion tab of
> job configuration.
> And the URL's were getting indexed earlier and except for index name in
> Elasticsearch nothing is changed in Job specification and in other
> connectors.
>
> Thanks
> Priya
>
> On Tue, Oct 29, 2019 at 3:40 PM Karl Wright  wrote:
>
>> ManifoldCF is an incremental crawler, which means that on every
>> (non-continuous) job run it sees which documents it can find and removes
>> the ones it can't.  The history for the documents being deleted should tell
>> you why they are being deleted -- it may be that (a) they weren't found, or
>> (b) that the document specification in the job changed and they are no
>> longer included in the job.
>>
>> Karl
>>
>>
>> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora  wrote:
>>
>>> Hi All,
>>>
>>> I have a query regarding ManifoldCF Job process.I have a job to crawl
>>> intranet site
>>> Repository Type:- Web
>>> Output Connector Type:- Elastic search.
>>>
>>> Job have to crawl around4-5 lakhs of total records. I have discarded the
>>> previous index and created a new index(in Elasticsearch) with proper
>>> mappings and settings and started the job again after cleaning Database
>>> even(Database used a PostgreSQL).
>>> But while the job continues its ingests the records properly but just
>>> before finishing (some times in between also), it initiates the process of
>>> Deletions and also it does not index the deleted documents again in index.
>>>
>>> Can you please something if I am doing anything wrong? or is this a
>>> process of manifoldcf if yes , why its not getting ingested again.
>>>
>>> Thanks and regards
>>> Priya
>>>
>>>


[jira] [Resolved] (CONNECTORS-1627) CSWS Connector: Error tossed: null (ownerRights may be null)

2019-10-25 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1627.
-
Fix Version/s: ManifoldCF 2.15
 Assignee: Karl Wright
   Resolution: Fixed

r1868956

> CSWS Connector: Error tossed: null (ownerRights may be null)
> 
>
> Key: CONNECTORS-1627
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1627
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: LiveLink connector
>Reporter: Markus Schuch
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.15
>
> Attachments: CONNECTORS-1627.patch, screenshot-1.png
>
>
> We encounter documents having object rights with {{ownerRights}} = {{null}} 
> leading to:
> {code}
> FATAL 2019-10-25T10:55:03,839 (Worker thread '15') - Error tossed: null
> java.lang.NullPointerException
>   at 
> org.apache.manifoldcf.crawler.connectors.csws.CswsConnector.processDocuments(CswsConnector.java:1276)
>  ~[?:?]
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Build failed in Jenkins: ManifoldCF-ant #719

2019-10-23 Thread Karl Wright
Hmm, did the Nuxeo download go away?  What do we replace it with?
Karl

On Tue, Oct 22, 2019 at 9:39 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See <
> https://builds.apache.org/job/ManifoldCF-ant/719/display/redirect?page=changes
> >
>
> Changes:
>
> [kwright] Fix the way attributes are indexed to be compatible with LAPI
> connector
>
>
> --
> [...truncated 453.86 KB...]
> AU
> site/src/documentation/resources/images/zh_CN/rss-job-security.PNG
> AU
> site/src/documentation/resources/images/zh_CN/rss-job-time-values.PNG
> AUsite/src/documentation/resources/images/zh_CN/rss-job-urls.PNG
> AUsite/src/documentation/resources/images/zh_CN/rss-status.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-configure-authoritytype.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-configure-server.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-job-metadata.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-job-paths.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-job-security.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepoint-status.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointadauthority-configure-cache.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointadauthority-configure-dc.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointadauthority-status.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointnativeauthority-configure-cache.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointnativeauthority-configure-server.PNG
> AU
> site/src/documentation/resources/images/zh_CN/sharepointnativeauthority-status.PNG
> AU
> site/src/documentation/resources/images/zh_CN/simple-history-example.PNG
> AU
> site/src/documentation/resources/images/zh_CN/simple-history-select-activities.PNG
> AU
> site/src/documentation/resources/images/zh_CN/simple-history-select-connection.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-arguments.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-commits.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-documents.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-schema.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-server.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-solr-type.PNG
> AU
> site/src/documentation/resources/images/zh_CN/solr-configure-zookeeper.PNG
> AUsite/src/documentation/resources/images/zh_CN/solr-status.PNG
> AU
> site/src/documentation/resources/images/zh_CN/tika-job-exceptions.PNG
> AU
> site/src/documentation/resources/images/zh_CN/tika-job-field-mapping.PNG
> AU
> site/src/documentation/resources/images/zh_CN/transformation-throttling.PNG
> AU
> site/src/documentation/resources/images/zh_CN/view-authority-connection.PNG
> AUsite/src/documentation/resources/images/zh_CN/view-job.PNG
> AU
> site/src/documentation/resources/images/zh_CN/view-mapping-connection.PNG
> AU
> site/src/documentation/resources/images/zh_CN/view-output-connection.PNG
> AU
> site/src/documentation/resources/images/zh_CN/view-repository-connection.PNG
> AU
> site/src/documentation/resources/images/zh_CN/view-transformation-connection.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-access-credentials-session-form.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-access-credentials-session.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-access-credentials.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-bandwidth.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-certificates.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-email.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-configure-robots.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-canonicalization.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-exclusions.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-hop-filters.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-inclusions.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-metadata.PNG
> AU
> site/src/documentation/resources/images/zh_CN/web-job-security.PNG
> AUsite/src/documentation/resources/images/zh_CN/web-job-seeds.PNG
> AUsite/src/documentation/resources/images/zh_CN/web-status.PNG
> AUsite/src/documentation/resources/images/zh_CN/welcome-screen.PNG
> AU
> site/src/documentation/resources/images/zh_CN/wiki-configure-server.PNG
> AUsite/src/documentation/resources/images/ManifoldCF-logo.PNG
> AUsite/src/documentation/resources/images/lucene_outline_200.gif
> AU

[jira] [Assigned] (CONNECTORS-1626) CSWS Authority does no return all user permissions

2019-10-21 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1626:
---

Assignee: Markus Schuch

> CSWS Authority does no return all user permissions
> --
>
> Key: CONNECTORS-1626
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1626
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: LiveLink connector
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
>
> Currently the CSWS Authority does return tokens for groups, a user is a 
> directly a member of.
> The CSWS Authority does not return 
> - tokens for transitive group memberships 
> - tokens for project group memberships



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Manifold with OpenJDK

2019-10-16 Thread Karl Wright
I use it this way all the time.
Karl


On Wed, Oct 16, 2019 at 11:32 AM Praveen Bejji 
wrote:

> Hi,
>
> We are planning on using ManifoldCF with Open JDK 1.8 on Linux  server.
> Can you please let us know if there are any known issues/challenges on
> using ManifldCF with Open JDK?
>
>
> Thanks,
> Praveen
>


Re: Box connector

2019-10-12 Thread Karl Wright
If there is such a connector, I don't know about it.  Hopefully we'll find
out soon if somebody has developed one on the outside they're willing to
contribute or make available.
Karl

On Fri, Oct 11, 2019 at 2:17 PM SREEJITH va  wrote:

> Hi, I am working on a document migration project, which requires to
> migrate documents to Box(  https://www.box.com/) system. Do we have any
> output connector exist for box system or any development in progress?
>


Re: [jira] [Commented] (CONNECTORS-1625) When processing a specific PDF Manifold goes out of memory

2019-10-11 Thread Karl Wright
If you call Tika yourself, and you aren't using streams, then that would be
an obvious reason why your memory problems occur in that environment.
Karl


On Fri, Oct 11, 2019 at 9:26 AM Donald Van den Driessche (Jira) <
j...@apache.org> wrote:

>
> [
> https://issues.apache.org/jira/browse/CONNECTORS-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949443#comment-16949443
> ]
>
> Donald Van den Driessche commented on CONNECTORS-1625:
> --
>
> After running the same process (with the same config) locally, we had no
> issues.
> So, it might be something with the streams.
>
>
>
> We've written a custom connector to fetch the files. It might use the
> wrong way to provide the file to the Tika parser.
>
> > When processing a specific PDF Manifold goes out of memory
> > --
> >
> > Key: CONNECTORS-1625
> > URL:
> https://issues.apache.org/jira/browse/CONNECTORS-1625
> > Project: ManifoldCF
> >  Issue Type: Bug
> >  Components: Tika extractor
> >    Affects Versions: ManifoldCF 2.12
> >Reporter: Donald Van den Driessche
> >Assignee: Karl Wright
> >Priority: Major
> > Attachments: abd-serotec-antibodies-uk.pdf
> >
> >
> > When processing attached file with manifoldcf 2.12, we keep getting an
> out of memory error.
> > When just parsing it throug Tika 1.18, no issues are being found.
> > Can anyone look into it?
> > Thanks in advance!
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: Multiprocess-ZK- Inside a docker container(linux)

2019-10-10 Thread Karl Wright
There is nothing special about running in Docker.  Something in your setup
is just incorrect.
Karl

On Thu, Oct 10, 2019 at 2:26 AM Priya Arora  wrote:

> I have tested the working of multiprocess_zk_example outside the docker,
> it's working fine. But has anybody implemented the way to run all process
> of multi process inside docker.
>
> On Thu, Oct 10, 2019 at 11:32 AM Karl Wright  wrote:
>
>> Well, I confirmed that the multiprocess_zk example scripts work fine
>> outside of docker.  You'll have to debug why it's not working for you I'm
>> afraid.
>>
>> Karl
>>
>>
>> On Thu, Oct 10, 2019 at 12:56 AM Priya Arora  wrote:
>>
>>> Yes , it is same, checked again
>>> [image: image.png]
>>>
>>> On Wed, Oct 9, 2019 at 8:06 PM Karl Wright  wrote:
>>>
>>>> JAVA_HOME should point to the jdk root, not to the bin/java directory.
>>>> Karl
>>>>
>>>> On Wed, Oct 9, 2019 at 10:32 AM Priya Arora 
>>>> wrote:
>>>>
>>>>> Yes JAVA_HOME , has been set as usr/local/openjdk-8/bin/java, as the
>>>>> docker image of Java is based on openjdk
>>>>>
>>>>>
>>>>> > On 09-Oct-2019, at 7:57 PM, Karl Wright  wrote:
>>>>> >
>>>>>
>>>>


Re: Multiprocess-ZK- Inside a docker container(linux)

2019-10-10 Thread Karl Wright
Well, I confirmed that the multiprocess_zk example scripts work fine
outside of docker.  You'll have to debug why it's not working for you I'm
afraid.

Karl


On Thu, Oct 10, 2019 at 12:56 AM Priya Arora  wrote:

> Yes , it is same, checked again
> [image: image.png]
>
> On Wed, Oct 9, 2019 at 8:06 PM Karl Wright  wrote:
>
>> JAVA_HOME should point to the jdk root, not to the bin/java directory.
>> Karl
>>
>> On Wed, Oct 9, 2019 at 10:32 AM Priya Arora  wrote:
>>
>>> Yes JAVA_HOME , has been set as usr/local/openjdk-8/bin/java, as the
>>> docker image of Java is based on openjdk
>>>
>>>
>>> > On 09-Oct-2019, at 7:57 PM, Karl Wright  wrote:
>>> >
>>>
>>


Re: Multiprocess-ZK- Inside a docker container(linux)

2019-10-09 Thread Karl Wright
JAVA_HOME should point to the jdk root, not to the bin/java directory.
Karl

On Wed, Oct 9, 2019 at 10:32 AM Priya Arora  wrote:

> Yes JAVA_HOME , has been set as usr/local/openjdk-8/bin/java, as the
> docker image of Java is based on openjdk
>
>
> > On 09-Oct-2019, at 7:57 PM, Karl Wright  wrote:
> >
>


Re: Multiprocess-ZK- Inside a docker container(linux)

2019-10-09 Thread Karl Wright
Do you have JAVA_HOME set?
Karl


On Wed, Oct 9, 2019 at 5:01 AM Priya Arora  wrote:

> Hi All,,
>
> Getting this below error while accessing run zookeeper file  to run multi
> process zookeeper process inside docker comnatiner
>
> root@67d5fbd824c4:/usr/share/manifoldcf/multiprocess-zk-example# sh
> runzookeeper.sh
> runzookeeper.sh: 18: runzookeeper.sh: [[: not found
> executecommand.sh: 18: executecommand.sh: [[: not found
>
> Can somebody suggest why am i getting this error.
>
> Thanks
> Priya
>
> On Wed, Oct 9, 2019 at 1:58 PM Priya Arora  wrote:
>
> > Hi All,
> >
> > Any suggestions would be really helpful.
> >
> > Thanks and regards
> > Priya
> >
> > On Thu, Oct 3, 2019 at 10:33 AM Priya Arora  wrote:
> >
> >> Hi Cihad,
> >>
> >> *Do you run all scripts in only one docker container or multiple*?- I
> >> have tried with the approach of creating one sh file(which includes
> call to
> >> all other required sh files to start multiprocess)
> >> *I think these scripts should run in separate containers.  :-*Does
> >> that mean to have 6 different container's for all 6 sh files.
> >>
> >> * I recommend to use mysql or postgresql instead of
> >> start-database[.sh|.bat]* :- I am using postgres as database, and
> >> defined configuration settings in properties-global.xml, so will the
> >> database starts from this file and do need to start start-database.sh
> file.
> >>
> >> *Some docker related config settings are as below:-*
> >>
> >> RUN wget
> >>
> http://apache.mirror.rafal.ca/manifoldcf/apache-manifoldcf-${MANIFOLDCF_VERSION}/apache-manifoldcf-${MANIFOLDCF_VERSION}-bin.tar.gz
> >> && \
> >> wget
> >>
> https://maven.forgerock.org/repo/repo/eu/agno3/jcifs/jcifs-ng/${CIFS_VERSION}/jcifs-ng-${CIFS_VERSION}.jar
> >> && \
> >> tar -xzvf apache-manifoldcf-${MANIFOLDCF_VERSION}-bin.tar.gz && \
> >> cp -R apache-manifoldcf-${MANIFOLDCF_VERSION} /usr/share/manifoldcf
> >> && \
> >> cp jcifs-ng-${CIFS_VERSION}.jar
> >> /usr/share/manifoldcf/connector-lib-proprietary
> >>
> >> EXPOSE 8345
> >>
> >> WORKDIR /usr/share/manifoldcf/multiprocess-zk-example
> >> COPY config/mf/multiprocess/sh.sh
> >> /usr/share/manifoldcf/multiprocess-zk-example
> >>
> >> CMD [config/mf/multiprocess/sh.sh start]
> >> ,
> >> *sh.sh :- is file that includes all  other file to call.*
> >> #!/bin/bash
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/runzookeeper.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/setglobalproperties.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/start-database.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/initialize.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/start-agents.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/start-agents-2.sh
> >> sh /usr/share/manifoldcf/multiprocess-zk-example/start-webapps.sh
> >>
> >>
> >> Any suggestion would be really helpful.
> >>
> >> Thanks and regards
> >> Priya
> >>
> >> On Tue, Oct 1, 2019 at 7:25 PM Cihad Guzel  wrote:
> >>
> >>> Hi Priya,
> >>>
> >>> Do you run all scripts in only one docker container or multiple? How do
> >>> you
> >>> use it? I think these scripts should run in separate containers.
> >>>
> >>> There is not a single command to run multiprocess-zk-example. Maybe you
> >>> can
> >>> run manifoldcf in a single process example. I recommend to use mysql or
> >>> postgresql instead of start-database[.sh|.bat]  if you want to use on
> >>> production environment.
> >>>
> >>> Your container restart problem is related that your scripts in the
> >>> container is terminated. If the script is terminated, the docker
> >>> container
> >>> is stopped and it can start again. Please check your Dockerfile, docker
> >>> run
> >>> command and the runnable script in the container.
> >>>
> >>> You should give more details about that your scripts and Dockerfile for
> >>> more comments.
> >>>
> >>> Kind Regards,
> >>> Cihad Guzel
> >>>
> >>> Priya Arora , 1 Eki 2019 Sal, 15:10 tarihinde
> şunu
> >>> yazdı:
> >>>
> >>> > Hi All,
> >>> >
> >>> > I am trying to run multi-process zookeeper process inside a docker
> >>> > container.
> >>> > Do we need to follow all steps to run multi process:-
> >>> >
> >>> >1.  *runzookeeper[.sh|.bat]* script)
> >>> >2.  *setglobalproperties[.sh|.bat]*)
> >>> >3. *start-database[.sh|.bat]*)
> >>> >4.  *initialize[.sh|.bat]*)
> >>> >5.  *start-agents[.sh|.bat]*, and optionally
> >>> *start-agents-2[.sh|.bat]*)
> >>> >6. *start-webapps[.sh|.bat]*)
> >>> >
> >>> > Is there any single command to run multi process ,as inside
> dockerfile
> >>> we
> >>> > can configure one command to start up.
> >>> > I have tried the approach to create a single sh/jar file that is
> >>> calling
> >>> > all the required(mentioned above) sh files. but its getting the
> >>> conatiner
> >>> > in restart mode.(every 1,2 minutes)
> >>> > Has any body tried the approach to configure multi process
> environment
> >>> > inside docker container.
> >>> >
> >>> > Also i manually 

Re: Error in creating authority connection with Active Directory

2019-10-09 Thread Karl Wright
Hi,
Please first of all subscribe to this list if you are going to post here.
Otherwise I have to moderate your posts into it.
Second, the users list might be a better choice.  But first, you'll need to
provide more information, such as:

(1) Providing the information from the connection view page, so we can see
your configuration and the connection status;
(2) Describing anything at all unique about the Active Directory setup.

Thanks,
Karl


On Wed, Oct 9, 2019 at 7:15 AM muthukumar r 
wrote:

> We are trying to use authority service in ManifoldCF to connect to Active
> directory to get the user tokens but We are getting dead authority error.
> Kindly help me to resolve the issue.
>


<    2   3   4   5   6   7   8   9   10   11   >