Re: Where and how is ManifoldCF used in production?

2019-03-22 Thread Steph van Schalkwyk
I've been using MCF in PROD for a couple of years now. Mostly Elasticsearch (5.x onwards) and SOLR. >From a couple of 100k docs to millions of jdbc or pdf/html etc. Very solid. Make sure you apply the DB tuning parameters. Steph *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-12-03 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707411#comment-16707411 ] Steph van Schalkwyk commented on CONNECTORS-1546: - That's in the codebase I sent

[jira] [Comment Edited] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487 ] Steph van Schalkwyk edited comment on CONNECTORS-1529 at 11/2/18 1:56 AM

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487 ] Steph van Schalkwyk commented on CONNECTORS-1529: - I added it as a addField

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672431#comment-16672431 ] Steph van Schalkwyk commented on CONNECTORS-1546: - Removed. > Optim

[jira] [Updated] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1552: Attachment: screenshot-1.png > Apache ManifoldCF Elastic Connec

[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672430#comment-16672430 ] Steph van Schalkwyk commented on CONNECTORS-1552: - !screenshot-1.png! > Apa

[jira] [Issue Comment Deleted] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1552: Comment: was deleted (was: !image-2018-11-01-20-00-35-913.png

[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672428#comment-16672428 ] Steph van Schalkwyk commented on CONNECTORS-1552: - !image-2018-11-01-20-00-35-913

[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672427#comment-16672427 ] Steph van Schalkwyk commented on CONNECTORS-1552: - I have added username

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672423#comment-16672423 ] Steph van Schalkwyk commented on CONNECTORS-1529: - I have added the "docum

Re: [jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Steph van Schalkwyk
I included all the fixes to ES and I have to debug before unleashing on an unsuspecting audience. *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net <http://www.remcam.net/> Skype: svanscha

Re: [jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-10-29 Thread Steph van Schalkwyk
I'm working on that one as well. Bit of a fix with a client right now. Will issue patch. *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk <https://mail.go

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-10-16 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651942#comment-16651942 ] Steph van Schalkwyk commented on CONNECTORS-1546: - Hans is correct. I would remove

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-11 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611426#comment-16611426 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Karl, getting this when

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-09-11 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611421#comment-16611421 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Olivier, Karl: I'm now getting

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-11 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611234#comment-16611234 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Yes. It was due to the patch

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609786#comment-16609786 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Ficed above on client side

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609726#comment-16609726 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Tested and allowing lowercasing

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609676#comment-16609676 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Karl Done that. Works

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609462#comment-16609462 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Will do. Thanks Karl

Re: [jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk
For some reason the patch only applied partially. git apply --whitespace=warn ../webconnector.patch Trying again. *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net <http://www.remcam.net/> Skype: s

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609372#comment-16609372 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Tested over weekend.  &quo

[jira] [Commented] (CONNECTORS-1530) When using Postgres: 'org.apache.manifoldcf.agents.transformation.restservice.RestExtractor' was found.

2018-09-07 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607362#comment-16607362 ] Steph van Schalkwyk commented on CONNECTORS-1530: - Ah. Thanks. I was beginning

[jira] [Created] (CONNECTORS-1530) When using Postgres: 'org.apache.manifoldcf.agents.transformation.restservice.RestExtractor' was found.

2018-09-07 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1530: --- Summary: When using Postgres: 'org.apache.manifoldcf.agents.transformation.restservice.RestExtractor' was found. Key: CONNECTORS-1530 URL: https

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606600#comment-16606600 ] Steph van Schalkwyk commented on CONNECTORS-1529: - I agree. I can do that in the Web

[jira] [Updated] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1529: Attachment: elasticsearch.patch > Add "url" output

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606241#comment-16606241 ] Steph van Schalkwyk commented on CONNECTORS-1529: - Patch attached. > Add &

[jira] [Updated] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1529: Summary: Add "url" output element to ES Output Connector

[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605930#comment-16605930 ] Steph van Schalkwyk commented on CONNECTORS-1529: - Will submit patch today

[jira] [Updated] (CONNECTORS-1529) Add "url" output element to ES Output Connector

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1529: Attachment: (was: image-2018-09-06-10-27-57-879.png) > Add &

[jira] [Created] (CONNECTORS-1529) Add "url" output element to ES Output Connector

2018-09-06 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1529: --- Summary: Add "url" output element to ES Output Connector Key: CONNECTORS-1529 URL: https://issues.apache.org/jira/browse/CONNECTORS-1529

[jira] [Comment Edited] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605897#comment-16605897 ] Steph van Schalkwyk edited comment on CONNECTORS-1523 at 9/6/18 3:09 PM

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605897#comment-16605897 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Hi Olivier, I managed to get

[jira] [Commented] (CONNECTORS-1527) After a week or running, MCF UI reverst to file index listing instead of UI display

2018-09-06 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605881#comment-16605881 ] Steph van Schalkwyk commented on CONNECTORS-1527: - Looking into it. Will report

Re: [jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-05 Thread Steph van Schalkwyk
Thanks Karl! *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk <https://mail.google.com/mail/u/0/#> <http://linkedin.com/in/vanschalkwyk> On Wed,

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-05 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604616#comment-16604616 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Removed from (my) ElasticSearch

[jira] [Commented] (CONNECTORS-1528) Web connector needs canonicalization mode that lowercases URLs

2018-09-05 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604594#comment-16604594 ] Steph van Schalkwyk commented on CONNECTORS-1528: - Karl For the time-being I've

[jira] [Created] (CONNECTORS-1527) After a week or running, MCF UI reverst to file index listing instead of UI display

2018-09-04 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1527: --- Summary: After a week or running, MCF UI reverst to file index listing instead of UI display Key: CONNECTORS-1527 URL: https://issues.apache.org/jira/browse

[jira] [Commented] (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2018-08-23 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590882#comment-16590882 ] Steph van Schalkwyk commented on CONNECTORS-104: I'm running into a seeding issue

[jira] [Resolved] (CONNECTORS-1525) ElasticSearch Connector date issue

2018-08-20 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk resolved CONNECTORS-1525. - Resolution: Fixed Fix Version/s: ManifoldCF 2.10 Add multi

[jira] [Created] (CONNECTORS-1525) ElasticSearch Connector date issue

2018-08-15 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1525: --- Summary: ElasticSearch Connector date issue Key: CONNECTORS-1525 URL: https://issues.apache.org/jira/browse/CONNECTORS-1525 Project: ManifoldCF

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-10 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575837#comment-16575837 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Olivier Thank you

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575726#comment-16575726 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Hi Karl Just cloned and rebuilt

Re: [jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk
Hi Karl I just cloned and built. Works now. Last build was Aug 8. Thanks for the help! Steph On Thu, Aug 9, 2018 at 8:20 PM, Steph van Schalkwyk (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/CONNECTORS-1523? > page=com.atlassian.jira.plugin.system.issuetabpanel

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575639#comment-16575639 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Hi Karl I built this from trunk

Re: [jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk
I'm adding head here for the time-being: if (includeFilters.isEmpty()) { includeFilters.add(HtmlExtractorConfig.WHITELIST_DEFAULT); On Thu, Aug 9, 2018 at 5:07 PM, Steph van Schalkwyk (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/CONNECTORS-1523?

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575466#comment-16575466 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Olivier I need the as well

[jira] [Commented] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575089#comment-16575089 ] Steph van Schalkwyk commented on CONNECTORS-1523: - Thank you Olivier.  Do you know

[jira] [Created] (CONNECTORS-1523) HTML Extractor transformation connector - "No englobing tag specified"

2018-08-09 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1523: --- Summary: HTML Extractor transformation connector - "No englobing tag specified" Key: CONNECTORS-1523 URL: https://issues.apache.org/jira/browse/

[jira] [Created] (CONNECTORS-1522) Add SSL trust certificates list to ElasticSearch output connector

2018-08-08 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1522: --- Summary: Add SSL trust certificates list to ElasticSearch output connector Key: CONNECTORS-1522 URL: https://issues.apache.org/jira/browse/CONNECTORS-1522

[jira] [Commented] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2018-07-27 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560435#comment-16560435 ] Steph van Schalkwyk commented on CONNECTORS-1519: - Will do Karl. Now that I'm back

[jira] [Updated] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2018-07-27 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1519: Description: Investigating CLIENTPROTOCOLEXCEPTION when using 2.10

[jira] [Created] (CONNECTORS-1519) CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y

2018-07-27 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1519: --- Summary: CLIENTPROTOCOLEXCEPTION is thrown with 2.10 -> ES 6.x.y Key: CONNECTORS-1519 URL: https://issues.apache.org/jira/browse/CONNECTORS-1

[jira] [Commented] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559137#comment-16559137 ] Steph van Schalkwyk commented on CONNECTORS-1518: - Hi Karl Thank you

Re: [jira] [Resolved] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk
type not html WARN 2018-07-26T20:38:51,832 (Worker thread '0') - no processing, mime type not html WARN 2018-07-26T20:41:00,635 (Worker thread '1') - no processing, mime type not html WARN 2018-07-26T20:41:00,646 (Worker thread '0') - no processing, mime type not html ``` *Steph van Schalkwyk

[jira] [Comment Edited] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558819#comment-16558819 ] Steph van Schalkwyk edited comment on CONNECTORS-1191 at 7/26/18 7:52 PM

[jira] [Commented] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558819#comment-16558819 ] Steph van Schalkwyk commented on CONNECTORS-1191: - I'm seeing this with 2.10

[jira] [Updated] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1518: Description:   ```Jul 26, 2018 1:21:51 PM

[jira] [Updated] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1518: Description:   ```Jul 26, 2018 1:21:51 PM

[jira] [Updated] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1518: Description: {{ }}{{ }}{{```}}{{Jul 26, 2018 1:21:51 PM

[jira] [Created] (CONNECTORS-1518) MCF shutting down when Tika is used

2018-07-26 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1518: --- Summary: MCF shutting down when Tika is used Key: CONNECTORS-1518 URL: https://issues.apache.org/jira/browse/CONNECTORS-1518 Project: ManifoldCF

Re: [ANNOUNCE] Please welcome Steph van Schalkwyk as a ManifoldCF committer

2017-10-03 Thread Steph van Schalkwyk
Much appreciated everyone! Special thanks to Karl. *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk <https://mail.google.com/mail/u/0/#> &l

[jira] [Commented] (CONNECTORS-1461) Scheduled for Processing shows "01-01-1970 00:00:00.000"

2017-09-27 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183524#comment-16183524 ] Steph van Schalkwyk commented on CONNECTORS-1461: - Thanks Karl. Will close

[jira] [Closed] (CONNECTORS-1461) Scheduled for Processing shows "01-01-1970 00:00:00.000"

2017-09-27 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk closed CONNECTORS-1461. --- User-friendly aspect to be added later... As designed. 0 epoch means "

[jira] [Created] (CONNECTORS-1461) Scheduled for Processing shows "01-01-1970 00:00:00.000"

2017-09-27 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1461: --- Summary: Scheduled for Processing shows "01-01-1970 00:00:00.000" Key: CONNECTORS-1461 URL: https://issues.apache.org/jira/browse/CONNE

[jira] [Commented] (CONNECTORS-1440) "Created date field name" is not honored for pdf filesystem to ElasticSearch

2017-07-05 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075004#comment-16075004 ] Steph van Schalkwyk commented on CONNECTORS-1440: - Hero! > "Created date fi

[jira] [Commented] (CONNECTORS-1440) "Created date field name" is not honored for pdf filesystem to ElasticSearch

2017-07-05 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074961#comment-16074961 ] Steph van Schalkwyk commented on CONNECTORS-1440: - Hi Karl Been playing with NIO

[jira] [Commented] (CONNECTORS-1440) "Created date field name" is not honored for pdf filesystem to ElasticSearch

2017-07-03 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072635#comment-16072635 ] Steph van Schalkwyk commented on CONNECTORS-1440: - File System Connector. > "

[jira] [Commented] (CONNECTORS-1439) No CretedOn, LastModified etc. datetimes on ElasticSearch output with File System input.

2017-07-01 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071374#comment-16071374 ] Steph van Schalkwyk commented on CONNECTORS-1439: - Thanks. POSIX FS does not have

[jira] [Created] (CONNECTORS-1440) "Created date field name" is not honored for pdf filesystem to ElasticSearch

2017-07-01 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1440: --- Summary: "Created date field name" is not honored for pdf filesystem to ElasticSearch Key: CONNECTORS-1440 URL: https://issues.apache.org/j

[jira] [Resolved] (CONNECTORS-1439) No CretedOn, LastModified etc. datetimes on ElasticSearch output with File System input.

2017-06-30 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk resolved CONNECTORS-1439. - Resolution: Not A Problem One has to fill in the field names

[jira] [Created] (CONNECTORS-1439) No CretedOn, LastModified etc. datetimes on ElasticSearch output with File System input.

2017-06-30 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1439: --- Summary: No CretedOn, LastModified etc. datetimes on ElasticSearch output with File System input. Key: CONNECTORS-1439 URL: https://issues.apache.org/jira

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060124#comment-16060124 ] Steph van Schalkwyk commented on CONNECTORS-1433: - NOT using attachment mapper

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060059#comment-16060059 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Also works for pdf

[jira] [Updated] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1433: Attachment: image.png OK, just started to work. I should have looked

[jira] [Updated] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1433: Attachment: image.png This part of the connector code doesn't seem

[jira] [Commented] (CONNECTORS-1435) Changing ElasticSearch connector parameters result in changes not being propagated to MCF

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059988#comment-16059988 ] Steph van Schalkwyk commented on CONNECTORS-1435: - I think you may be right: I had

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059986#comment-16059986 ] Steph van Schalkwyk commented on CONNECTORS-1433: - ​The mapper attachments plugin has

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059970#comment-16059970 ] Steph van Schalkwyk commented on CONNECTORS-1433: - In this scenario below, what would

[jira] [Commented] (CONNECTORS-1435) Changing ElasticSearch connector parameters result in changes not being propagated to MCF

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059942#comment-16059942 ] Steph van Schalkwyk commented on CONNECTORS-1435: - I dropped the HSQLDB database

[jira] [Created] (CONNECTORS-1435) Changing ElasticSearch connector parameters result in changes not being propagated to MCF

2017-06-22 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1435: --- Summary: Changing ElasticSearch connector parameters result in changes not being propagated to MCF Key: CONNECTORS-1435 URL: https://issues.apache.org/jira

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059799#comment-16059799 ] Steph van Schalkwyk commented on CONNECTORS-1433: - The ES part works in my mock-up

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059715#comment-16059715 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Thanks Karl. I realize

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059630#comment-16059630 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Are you specifying a field name

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-22 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059588#comment-16059588 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Testing it now. Having some issues

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058555#comment-16058555 ] Steph van Schalkwyk commented on CONNECTORS-1433: - This is on Github? (apache

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058550#comment-16058550 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Much appreciated Karl! I can't

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058149#comment-16058149 ] Steph van Schalkwyk commented on CONNECTORS-1433: - I think just a blank field where

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057866#comment-16057866 ] Steph van Schalkwyk commented on CONNECTORS-1433: - To clarify, Cassandra can decode

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057863#comment-16057863 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Hi Karl Thanks for the information

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057709#comment-16057709 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Edit: It is ElasticSearch doing

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-21 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057707#comment-16057707 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Karl I see it with FileSystem

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-20 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056844#comment-16056844 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Hello Karl First, thank you

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-20 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056393#comment-16056393 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Karl I can't seem to find a way

[jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-17 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053060#comment-16053060 ] Steph van Schalkwyk commented on CONNECTORS-1433: - Hi Karl That sounds great. Let me

[jira] [Created] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64

2017-06-15 Thread Steph van Schalkwyk (JIRA)
Steph van Schalkwyk created CONNECTORS-1433: --- Summary: Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64 Key: CONNECTORS-1433 URL: https://issues.apache.org/jira/browse

[jira] [Commented] (CONNECTORS-1432) Job with Tika will not save - no error popup

2017-06-15 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051019#comment-16051019 ] Steph van Schalkwyk commented on CONNECTORS-1432: - Hi Karl Thanks! I'm using Opera

[jira] [Commented] (CONNECTORS-1432) Job with Tika will not save - no error popup

2017-06-15 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050988#comment-16050988 ] Steph van Schalkwyk commented on CONNECTORS-1432: - Using the simple single example

[jira] [Commented] (CONNECTORS-1432) Job with Tika will not save - no error popup

2017-06-15 Thread Steph van Schalkwyk (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050987#comment-16050987 ] Steph van Schalkwyk commented on CONNECTORS-1432: - It seems to stop saving if I

  1   2   >