[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248078#comment-17248078
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Patches applied, thanks.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248041#comment-17248041
 ] 

Olivier Tavard commented on CONNECTORS-1653:


There is a new patch including the previous ones : 
[^patch_solr_ingester_connector_11_12_2020.txt]
I wrote the declaration : 
{code:java}
private final static String defaultAuthorityDenyToken = "DEAD_AUTHORITY";
{code}
Because it was also into the code of plenty connectors : Generic, Null, JDBC, 
Dropbox, RSS, Web, SharePoint, etc...
Anyway I changed the code as you asked to use the variable into the superclass 
which is :
{code:java}
public static final String GLOBAL_DENY_TOKEN = "DEAD_AUTHORITY";
{code}

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247997#comment-17247997
 ] 

Karl Wright commented on CONNECTORS-1653:
-

This patch has bugs in it.  Specifically:

{code}
-  private final static String defaultAuthorityDenyToken = "__nosecurity__";
+  private final static String defaultAuthorityDenyToken = "DEAD_AUTHORITY";
{code}

There is no token called "DEAD_AUTHORITY".  There is a value available to 
represent this in the superclass.  You should use that.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-02 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242801#comment-17242801
 ] 

Olivier Tavard commented on CONNECTORS-1653:


Hello,

 

This patch includes a fix relative to the deny token document that was not set 
to the right value.

It includes the previous patch that I sent yesterday so you can directly 
integrate this one : [^patch_solr_ingester_connector_03_12_2020.txt]

 

Thanks

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-02 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242393#comment-17242393
 ] 

Olivier Tavard commented on CONNECTORS-1653:


Hi [~kwri...@metacarta.com] 

Here is a patch to better manage the incremental indexation : 
[^patch_solr_ingester_connector_02_12_2020.txt].

Could you integrate it please ?

 

Thanks

 

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-11-21 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236664#comment-17236664
 ] 

Olivier Tavard commented on CONNECTORS-1653:


I tested directly the trunk branch because the code was incorporated into it. 
And The CONNECTORS-1653 branch contains the bug that you fixed in the commit 
named "Add missing Jetty JSP jar so crawler UI works in the examples".

Anyway, the Solr repository connector works as expected : I indexed some 
example data from a Solr docker container with the gettingstarted collection 
and the indexation was fine so I think the integration is OK.

Regarding the documentation, tell me if there is something I can do about this. 
If I remember correctly, the MCF website code is versioned somewhere. So tell 
me if I can propose a patch to include the documentation relative to the Solr 
ingester connector code because it will be difficult to MCF users to use it 
without any documentation on it.

Thanks 

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-11-21 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236655#comment-17236655
 ] 

Olivier Tavard commented on CONNECTORS-1653:


Hi [~kwri...@metacarta.com]

Sure thing. I will build the branch today or tomorrow and I let you know.

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-11-21 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236615#comment-17236615
 ] 

Karl Wright commented on CONNECTORS-1653:
-

I've created branches/CONNECTORS-1653 with this work.

I've integrated it with the "solr" connector family, which now has two 
connectors in it: a "Solr" output connector, and a "Solr" repository connector. 
 This cuts down on dependencies and maintenance in the future.

[~olivierfl], if you wish to check out and build this branch, and verify that 
the connector works as expected, I'd appreciate it.  I will be doing the same 
thing as time permits over the next few days.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-15 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214976#comment-17214976
 ] 

Olivier Tavard commented on CONNECTORS-1653:


No it is not relevant, sorry about that. It only needs the solr-solrj*.jar 
mentioned upper in the file.

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-15 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214684#comment-17214684
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Looked briefly at the code; looked good so far from what I see.

However, one question.  The connector build.xml has this in it:

{code}
+
+
+
+
+  
+
+
+
+  
+
+
{code}

These are the ManifoldCF solr security plugins.  Do they apply here?


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-14 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214029#comment-17214029
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Sadly, little time for anything.  Not sure when the crunch will end either.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-14 Thread Olivier Tavard (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213828#comment-17213828
 ] 

Olivier Tavard commented on CONNECTORS-1653:


Not sure if you had the time yet to look at my contribution, but I'm available 
if you need some explanations about the code or the documentation.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)