Build failed in Jenkins: ManifoldCF » ManifoldCF-mvn #10

2020-12-11 Thread Apache Jenkins Server
See 


Changes:

[Karl Wright] Apply patch for solr ingestor connector

[Karl Wright] Improvements for CONNECTORS-1660


--
[...truncated 1.93 MB...]
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, List fieldValues)
[ERROR] ^
[ERROR] 
:899:
 warning: no @param for out
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:899:
 warning: no @param for fieldName
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:899:
 warning: no @param for fieldValue
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:905:
 warning: no @param for out
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:905:
 warning: no @param for aclType
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:905:
 warning: no @param for acl
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:905:
 warning: no @param for denyAcl
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:922:
 warning: no @param for inputDoc
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:922:
 warning: no @param for aclType
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:922:
 warning: no @param for acl
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:922:
 warning: no @param for denyAcl
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:1751:
 warning: no @param for inputField
[ERROR]   protected static String makeSafeLuceneField(String inputField)
[ERROR]   ^
[ERROR] 
:1751:
 warning: no @return
[ERROR]   protected static String makeSafeLuceneField(String inputField)
[ERROR]   ^
[ERROR] 

Build failed in Jenkins: ManifoldCF » ManifoldCF-Artifacts-Ant-JDK11 #11

2020-12-11 Thread Apache Jenkins Server
See 


Changes:

[Karl Wright] Apply patch for solr ingestor connector

[Karl Wright] Improvements for CONNECTORS-1660


--
[...truncated 1.21 MB...]
[javac]   symbol:   class Authentication
[javac]   location: class CswsSession
[javac] 
:98:
 error: cannot find symbol
[javac]   private final DocumentManagement documentManagementHandle;
[javac] ^
[javac]   symbol:   class DocumentManagement
[javac]   location: class CswsSession
[javac] 
:99:
 error: cannot find symbol
[javac]   private final ContentService contentServiceHandle;
[javac] ^
[javac]   symbol:   class ContentService
[javac]   location: class CswsSession
[javac] 
:100:
 error: cannot find symbol
[javac]   private final MemberService memberServiceHandle;
[javac] ^
[javac]   symbol:   class MemberService
[javac]   location: class CswsSession
[javac] 
:101:
 error: cannot find symbol
[javac]   private final SearchService searchServiceHandle;
[javac] ^
[javac]   symbol:   class SearchService
[javac]   location: class CswsSession
[javac] 
:109:
 error: cannot find symbol
[javac]   private Map workspaceTypeNodes = new HashMap<>();
[javac]   ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:198:
 error: cannot find symbol
[javac]   public DocumentManagement getDocumentManagementHandle() {
[javac]  ^
[javac]   symbol:   class DocumentManagement
[javac]   location: class CswsSession
[javac] 
:206:
 error: cannot find symbol
[javac]   public ContentService getContentServiceHandle() {
[javac]  ^
[javac]   symbol:   class ContentService
[javac]   location: class CswsSession
[javac] 
:214:
 error: cannot find symbol
[javac]   public MemberService getMemberServiceHandle() {
[javac]  ^
[javac]   symbol:   class MemberService
[javac]   location: class CswsSession
[javac] 
:222:
 error: cannot find symbol
[javac]   public SearchService getSearchServiceHandle() {
[javac]  ^
[javac]   symbol:   class SearchService
[javac]   location: class CswsSession
[javac] 
:250:
 error: cannot find symbol
[javac]   public Node getRootNode(final String nodeType)
[javac]  ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:266:
 error: cannot find symbol
[javac]   public List listNodes(final long nodeId)
[javac] ^
[javac]   symbol:   class Node
[javac]   location: class CswsSession
[javac] 
:282:
 error: cannot find symbol
[javac]   public List getChildren(final long nodeId)
[javac] ^
[javac]   

[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248078#comment-17248078
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Patches applied, thanks.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-11 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248054#comment-17248054
 ] 

Olivier Tavard commented on CONNECTORS-1660:


There is the global patch that includes the previous one without the log 
statement : [^patch_html_extractor_connector_11_12_2020.txt] 


> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_html_extractor_connector_02_12_2020.txt, 
> patch_html_extractor_connector_11_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-11 Thread Olivier Tavard (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Tavard updated CONNECTORS-1660:
---
Attachment: patch_html_extractor_connector_11_12_2020.txt

> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_html_extractor_connector_02_12_2020.txt, 
> patch_html_extractor_connector_11_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Olivier Tavard (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Tavard updated CONNECTORS-1653:
---
Attachment: patch_solr_ingester_connector_11_12_2020.txt

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248041#comment-17248041
 ] 

Olivier Tavard commented on CONNECTORS-1653:


There is a new patch including the previous ones : 
[^patch_solr_ingester_connector_11_12_2020.txt]
I wrote the declaration : 
{code:java}
private final static String defaultAuthorityDenyToken = "DEAD_AUTHORITY";
{code}
Because it was also into the code of plenty connectors : Generic, Null, JDBC, 
Dropbox, RSS, Web, SharePoint, etc...
Anyway I changed the code as you asked to use the variable into the superclass 
which is :
{code:java}
public static final String GLOBAL_DENY_TOKEN = "DEAD_AUTHORITY";
{code}

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: It's release time again

2020-12-11 Thread Karl Wright
Hi Olivier,

The reason these patches were not submitted right away was because there
are problems with both of them, both minor, but something I did not have
time to address myself of late.  If you could upgrade them accordingly I
would appreciate it.

Thanks,
Karl


On Fri, Dec 11, 2020 at 10:19 AM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:

> Hi Karl,
>
> Based on your suggestion to remind you of some actions that remain, could
> you take a look at the 2 patches I sent please :
> - one patch for Solr ingester connector :
>
> https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17242801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17242801
>
> About this patch, I also asked you if I could propose a patch in order to
> integrate the documentation relative to this connector :
> https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17236664=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17236664
>
> - one patch for MCF HTML connector :
> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1660
>
> Thanks,
>
> Olivier
>
> > Le 8 déc. 2020 à 21:49, Karl Wright  a écrit :
> >
> > We have a new connector in the family this release, and a number of bug
> > fixes - both major and minor - have been done.  I'm planning on spinning
> a
> > release candidate in about 2 weeks.
> >
> > I've been extremely busy with my day job this quarter, so if anyone is
> > aware of any issue or proposal or patch that might have been overlooked,
> > please remind me to look at it before then.  Thanks in advance!
> >
> > Karl
>
>


[jira] [Commented] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248000#comment-17248000
 ] 

Karl Wright commented on CONNECTORS-1660:
-

Please remove the log statement, since it will dump the entire document and 
will overwhelm the logs.


> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_html_extractor_connector_02_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247997#comment-17247997
 ] 

Karl Wright commented on CONNECTORS-1653:
-

This patch has bugs in it.  Specifically:

{code}
-  private final static String defaultAuthorityDenyToken = "__nosecurity__";
+  private final static String defaultAuthorityDenyToken = "DEAD_AUTHORITY";
{code}

There is no token called "DEAD_AUTHORITY".  There is a value available to 
represent this in the superclass.  You should use that.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: It's release time again

2020-12-11 Thread Olivier Tavard
Hi Karl,

Based on your suggestion to remind you of some actions that remain, could you 
take a look at the 2 patches I sent please :
- one patch for Solr ingester connector :
https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17242801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17242801

About this patch, I also asked you if I could propose a patch in order to 
integrate the documentation relative to this connector : 
https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17236664=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17236664

- one patch for MCF HTML connector :
https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1660

Thanks,

Olivier

> Le 8 déc. 2020 à 21:49, Karl Wright  a écrit :
> 
> We have a new connector in the family this release, and a number of bug
> fixes - both major and minor - have been done.  I'm planning on spinning a
> release candidate in about 2 weeks.
> 
> I've been extremely busy with my day job this quarter, so if anyone is
> aware of any issue or proposal or patch that might have been overlooked,
> please remind me to look at it before then.  Thanks in advance!
> 
> Karl



Re: JDBC authority - Make optional the ID query

2020-12-11 Thread Karl Wright
I think if there is an option for not needing to do the lookup then by all
means we should allow a pass-through.  But I believe there may already be
that option in other existing authority connectors.  It may be best in any
case to have a simple "pass through" authority connector available that can
be used everywhere, rather than make this an option of the JDBC connector.

Karl



On Tue, Dec 8, 2020 at 7:56 AM  wrote:

> Hi Karl,
>
> Currently, the query to retrieve the User ID from the USERNAME in the JDBC
> authority connector configuration is mandatory, an error is triggered if it
> is not fulfilled or if the query does not work. However, we may have a
> token
> query which only uses the USERNAME to work, making the User ID query
> useless
> and resources consuming for nothing.
>
> We may update the code to make the user ID query not mandatory, what do you
> think ?
>
> Regards,
> Julien Massiera
>
>
>
>
>
>