[jira] [Work stopped] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-27 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3015 stopped by Lewis John McGibbney. --- > Add more CI steps to GitHub master-build.

[jira] [Closed] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-27 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3015. --- > Add more CI steps to GitHub master-build.

[jira] [Resolved] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-27 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3015. - Resolution: Fixed > Add more CI steps to GitHub master-build.

[jira] [Work started] (NUTCH-2887) Migrate to JUnit 5 Jupiter

2023-10-24 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2887 started by Lewis John McGibbney. --- > Migrate to JUnit 5 Jupi

[jira] [Created] (NUTCH-3016) Upgrade Apache Ivy to 2.5.2

2023-10-24 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3016: --- Summary: Upgrade Apache Ivy to 2.5.2 Key: NUTCH-3016 URL: https://issues.apache.org/jira/browse/NUTCH-3016 Project: Nutch Issue Type: Task

[jira] [Assigned] (NUTCH-2887) Migrate to JUnit 5 Jupiter

2023-10-23 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2887: --- Assignee: Lewis John McGibbney > Migrate to JUnit 5 Jupi

[jira] [Work started] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-23 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3015 started by Lewis John McGibbney. --- > Add more CI steps to GitHub master-build.

[jira] [Work started] (NUTCH-3014) Standardize Job names

2023-10-23 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3014 started by Lewis John McGibbney. --- > Standardize Job na

Nutch codebase formatting

2023-10-23 Thread lewis john mcgibbney
Hi dev@, For the longest time the Nutch codebase has shipped with a eclipse-codeformat.xml [0] file. Whilst this has been largely successful in keeping the codebase uniform, it cannot/has not been integrated into continuous integration (CI) and subsequently not really enforced! Whilst I’m a big

[jira] [Created] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-22 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3015: --- Summary: Add more CI steps to GitHub master-build.yml Key: NUTCH-3015 URL: https://issues.apache.org/jira/browse/NUTCH-3015 Project: Nutch

[jira] [Updated] (NUTCH-3014) Standardize Job names

2023-10-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Description: There is a large degree of variability when we set the job name

[jira] [Updated] (NUTCH-3014) Standardize Job names

2023-10-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Description: There is a large degree of variability when we set the job name

[jira] [Updated] (NUTCH-3014) Standardize Job names

2023-10-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-3014: Summary: Standardize Job names (was: Standardize NutchJob job names

[jira] [Resolved] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-21 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-3013. - Resolution: Fixed Thanks for the review [~snagel]  > Employ commons-lang

[jira] [Closed] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-21 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3013. --- > Employ commons-lang3's StopWatch to simplify timing lo

[jira] [Created] (NUTCH-3014) Standardize NutchJob job names

2023-10-21 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3014: --- Summary: Standardize NutchJob job names Key: NUTCH-3014 URL: https://issues.apache.org/jira/browse/NUTCH-3014 Project: Nutch Issue Type

[jira] [Work started] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-20 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3013 started by Lewis John McGibbney. --- > Employ commons-lang3's StopWatch to simplify timing lo

[jira] [Created] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-20 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3013: --- Summary: Employ commons-lang3's StopWatch to simplify timing logic Key: NUTCH-3013 URL: https://issues.apache.org/jira/browse/NUTCH-3013 Project: Nutch

Establishing a Nutch development roadmap

2023-09-26 Thread lewis john mcgibbney
Hi dev@, I've been at arms length for a while as $dayjob changed and then changed again over the last number of years. With that being said, I wanted to start a thread on $title with the goal of establishing some "big items" we could put on the roadmap and maybe even publish... Here are some of

[jira] [Assigned] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2023-02-28 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2856: --- Assignee: (was: Lewis John McGibbney) > Implement a protocol-smb plu

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694741#comment-17694741 ] Lewis John McGibbney commented on NUTCH-2988: - Actually, digging deeper it looks like

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694736#comment-17694736 ] Lewis John McGibbney commented on NUTCH-2988: - It looks the the [elasticsearch-java client

[jira] [Commented] (NUTCH-2940) Develop Gradle Core Build for Apache Nutch

2022-06-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554866#comment-17554866 ] Lewis John McGibbney commented on NUTCH-2940: - WIP PR available at https://github.com/apache

[jira] [Assigned] (NUTCH-2940) Develop Gradle Core Build for Apache Nutch

2022-06-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2940: --- Assignee: Lewis John McGibbney > Develop Gradle Core Build for Apache Nu

[jira] [Created] (NUTCH-2944) Create Gradle Javadoc task

2022-04-22 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2944: --- Summary: Create Gradle Javadoc task Key: NUTCH-2944 URL: https://issues.apache.org/jira/browse/NUTCH-2944 Project: Nutch Issue Type: Sub-task

[jira] [Work started] (NUTCH-2944) Create Gradle Javadoc task

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2944 started by Lewis John McGibbney. --- > Create Gradle Javadoc t

[jira] [Resolved] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2943. - Resolution: Fixed > Implement core dependencies in build.gradle.

[jira] [Commented] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526695#comment-17526695 ] Lewis John McGibbney commented on NUTCH-2943: - Implemented in https://github.com/csci401

[jira] [Updated] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2943: Component/s: build > Implement core dependencies in build.gradle.

[jira] [Updated] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2943: Summary: Implement core dependencies in build.gradle.kts (was: Management

[jira] [Assigned] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2943: --- Assignee: Lewis John McGibbney > Implement core dependenc

[jira] [Work started] (NUTCH-2943) Implement core dependencies in build.gradle.kts

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2943 started by Lewis John McGibbney. --- > Implement core dependencies in build.gradle.

[jira] [Commented] (NUTCH-2939) Create Initial Jenkinsfile for Nutch Gradle Build

2022-04-22 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526527#comment-17526527 ] Lewis John McGibbney commented on NUTCH-2939: - Hi [~Lirongxuan01] did you ever complete

CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor

2022-03-04 Thread lewis john mcgibbney
Description: An XML external entity (XXE) injection vulnerability was discovered in the Any23 RDFa XSLTStylesheet extractor and is known to affect Any23 versions < 2.7. XML external entity injection (also known as XXE) is a web security vulnerability that allows an attacker to interfere with an

[ANNOUNCE] Apache Any23 2.7

2022-03-04 Thread lewis john mcgibbney
The Apache Any23 Project Management Committee is pleased to announce the release of Apache Any23 2.7. Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents. Any23 2.7 requires JDK11 to

[jira] [Assigned] (NUTCH-2939) Create Jenkinsfile for Nutch Gradle Build

2022-02-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2939: --- Assignee: Ryan Li > Create Jenkinsfile for Nutch Gradle Bu

[jira] [Created] (NUTCH-2939) Create Jenkinsfile for Nutch Gradle Build

2022-02-17 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2939: --- Summary: Create Jenkinsfile for Nutch Gradle Build Key: NUTCH-2939 URL: https://issues.apache.org/jira/browse/NUTCH-2939 Project: Nutch Issue

[jira] [Updated] (NUTCH-2934) Replace Apache Ant build system with Gradle

2022-02-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2934: Issue Type: Task (was: Improvement) > Replace Apache Ant build system with Gra

[jira] [Commented] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro

2022-01-24 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481528#comment-17481528 ] Lewis John McGibbney commented on NUTCH-2925: - Non-functioning branch available at https

[jira] [Work started] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro

2022-01-19 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2925 started by Lewis John McGibbney. --- > Secure the Nutch REST API using Apache Sh

[jira] [Work started] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2936 started by Lewis John McGibbney. --- > Early registration of URL stream handlers provided by plug

[jira] [Assigned] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2936: --- Assignee: Lewis John McGibbney > Early registration of URL stream handl

[jira] [Commented] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476730#comment-17476730 ] Lewis John McGibbney commented on NUTCH-2936: - I can reproduce this. Although I was planning

[jira] [Commented] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476728#comment-17476728 ] Lewis John McGibbney commented on NUTCH-2936: - [~snagel] which JDK are you using? > Ea

[jira] [Created] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

2022-01-15 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2938: --- Summary: Use Any23's RepositoryWriter to write structured data to Rdf4j repository Key: NUTCH-2938 URL: https://issues.apache.org/jira/browse/NUTCH-2938

[jira] [Commented] (NUTCH-2936) Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476719#comment-17476719 ] Lewis John McGibbney commented on NUTCH-2936: - I'll try to reproduce. Thanks > Ea

[jira] [Updated] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2919: Summary: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 (was: NUTCH-2919 Upgrade

[jira] [Resolved] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6

2022-01-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2919. - Resolution: Fixed > NUTCH-2919 Upgrade to Tika 2.2.1 and Any23

NUTCH-2934 Replace Apache Ant build system with Gradle

2022-01-13 Thread lewis john mcgibbney
Hi dev@, I'm about to start a new project with USC's Seniro Capstone program which will replace our legacy Ant build with Gradle. I opened https://issues.apache.org/jira/browse/NUTCH-2934 to track the work. I wasn't very sure about how well Fireant would serve us moving forward so although it was

[jira] [Commented] (NUTCH-2934) Replace Apache Ant build system with Gradle

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475625#comment-17475625 ] Lewis John McGibbney commented on NUTCH-2934: - I some house cleaning by closing off all

[jira] [Resolved] (NUTCH-2293) Make the unit tests which requires "plugin.folders" as integration tests

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2293. - Fix Version/s: 1.19 Resolution: Abandoned > Make the unit tests wh

[jira] [Resolved] (NUTCH-2901) migrate to maven or gradle

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2901. - Resolution: Abandoned > migrate to maven or gra

[jira] [Resolved] (NUTCH-2244) Publish Protocol-Interactiveselenium to central maven repo

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2244. - Fix Version/s: 1.19 Resolution: Abandoned > Publish Proto

[jira] [Resolved] (NUTCH-2638) Publish plugins in Maven

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2638. - Fix Version/s: 1.19 Resolution: Abandoned > Publish plugins in Ma

[jira] [Resolved] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2022-01-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2292. - Resolution: Abandoned > Mavenize the build for nutch-core and nutch-plug

[jira] [Created] (NUTCH-2934) Replace Apache Ant build system with Gradle

2022-01-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2934: --- Summary: Replace Apache Ant build system with Gradle Key: NUTCH-2934 URL: https://issues.apache.org/jira/browse/NUTCH-2934 Project: Nutch

[jira] [Updated] (NUTCH-2926) Implement persistent storage for Nutch Webserver resources

2022-01-11 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2926: Parent: NUTCH-2931 Issue Type: Sub-task (was: Improvement) > Implem

[jira] [Created] (NUTCH-2933) GET /seed doesn't return previously generated seed lists

2022-01-11 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2933: --- Summary: GET /seed doesn't return previously generated seed lists Key: NUTCH-2933 URL: https://issues.apache.org/jira/browse/NUTCH-2933 Project: Nutch

[jira] [Created] (NUTCH-2932) Create OpenAPI specification for Nutch 1.x REST API

2022-01-11 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2932: --- Summary: Create OpenAPI specification for Nutch 1.x REST API Key: NUTCH-2932 URL: https://issues.apache.org/jira/browse/NUTCH-2932 Project: Nutch

[jira] [Updated] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro

2022-01-11 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2925: Parent: NUTCH-2931 Issue Type: Sub-task (was: Improvement) > Sec

[jira] [Created] (NUTCH-2931) Improvements to 1.x REST API

2022-01-11 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2931: --- Summary: Improvements to 1.x REST API Key: NUTCH-2931 URL: https://issues.apache.org/jira/browse/NUTCH-2931 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-2919) NUTCH-2919 Upgrade to Tika 2.2.0 and Any23 2.6

2022-01-10 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2919: Summary: NUTCH-2919 Upgrade to Tika 2.2.0 and Any23 2.6 (was: Upgrade to Tika

[ANNOUNCE] Apache Any23 2.6 Release

2022-01-08 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.6. Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents. Any23 2.6 requires JDK11 to build and run. Release

[jira] [Work stopped] (NUTCH-2839) Implement Tez counters in Injector job

2022-01-08 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2839 stopped by Lewis John McGibbney. --- > Implement Tez counters in Injector

[jira] [Commented] (NUTCH-2839) Implement Tez counters in Injector job

2022-01-08 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471239#comment-17471239 ] Lewis John McGibbney commented on NUTCH-2839: - Really interesting [~abstractdog]. Your short

[jira] [Commented] (NUTCH-2839) Implement Tez counters in Injector job

2022-01-07 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471011#comment-17471011 ] Lewis John McGibbney commented on NUTCH-2839: - Hi [~abstractdog] I documented everything I

[jira] [Commented] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2022-01-07 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470992#comment-17470992 ] Lewis John McGibbney commented on NUTCH-2856: - I'm focusing on this now. > Implem

[jira] [Assigned] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2022-01-07 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2429: --- Assignee: Lewis John McGibbney > Fix Plugin System to allow proto

[jira] [Resolved] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2022-01-07 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2429. - Resolution: Fixed Finally merged into master branch [~hiranchaudhuri] thank you

[jira] [Updated] (NUTCH-2926) Implement persistent storage for Nutch Webserver resources

2022-01-04 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2926: Description: The Nutch webserver caches resources (seed lists, configuration, jobs

[jira] [Created] (NUTCH-2926) Implement persistent storage for Nutch Webserver resources

2022-01-04 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2926: --- Summary: Implement persistent storage for Nutch Webserver resources Key: NUTCH-2926 URL: https://issues.apache.org/jira/browse/NUTCH-2926 Project: Nutch

[jira] [Commented] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro

2022-01-04 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468743#comment-17468743 ] Lewis John McGibbney commented on NUTCH-2925: - [~markus17] didn't really like the idea

[jira] [Created] (NUTCH-2925) Secure the Nutch REST API using Apache Shiro

2022-01-04 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2925: --- Summary: Secure the Nutch REST API using Apache Shiro Key: NUTCH-2925 URL: https://issues.apache.org/jira/browse/NUTCH-2925 Project: Nutch

[jira] [Commented] (NUTCH-2923) Add Job Id in Job Failure messages

2022-01-03 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468361#comment-17468361 ] Lewis John McGibbney commented on NUTCH-2923: - Yes it absolutely would. I didn't see

[jira] [Comment Edited] (NUTCH-2923) Add Job Id in Job Failure messages

2022-01-03 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468361#comment-17468361 ] Lewis John McGibbney edited comment on NUTCH-2923 at 1/4/22, 5:11 AM

[jira] [Commented] (NUTCH-2923) Add Job Id in Job Failure messages

2022-01-02 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467745#comment-17467745 ] Lewis John McGibbney commented on NUTCH-2923: - We can easily obtain it via {{job.getStatus

Addressing Nutch use of CMS WAS: [IMPORTANT] - ci.apache.org and CMS Shutdown end of January 2022

2022-01-02 Thread lewis john mcgibbney
Hi Gavin, Thanks for the email below. It was my understanding that the Nutch project no longer relied on the legacy CMS framework. I wrote a new website and published it at https://github.com/apache/nutch-site with the static content being served on the asf-site branch. The old CMS website

[jira] [Commented] (NUTCH-2278) Handle alpha-2 language codes consistently

2022-01-02 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467688#comment-17467688 ] Lewis John McGibbney commented on NUTCH-2278: - No problems Fengtan… a test case would

[jira] [Commented] (NUTCH-2923) Add Job Id in Job Failure messages

2022-01-02 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467687#comment-17467687 ] Lewis John McGibbney commented on NUTCH-2923: - Hi Prakhar, I agree with you. Are you able

[jira] [Updated] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2856: Summary: Implement a protocol-smb plugin based on hierynomus/smbj (was: Implement

[jira] [Commented] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467111#comment-17467111 ] Lewis John McGibbney commented on NUTCH-2856: - Adding some notes from my research

[jira] [Updated] (NUTCH-2856) Implement a protocol-smb plugin based on

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2856: Summary: Implement a protocol-smb plugin based on (was: Implement

[jira] [Work started] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2856 started by Lewis John McGibbney. --- > Implement an appropriately licensed protocol-smb plu

[jira] [Updated] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2856: Issue Type: New Feature (was: Bug) > Implement an appropriately licensed proto

[jira] [Updated] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2856: Summary: Implement an appropriately licensed protocol-smb plugin (was: protocol

[jira] [Commented] (NUTCH-2856) protocol-smb plugin is outdated

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466704#comment-17466704 ] Lewis John McGibbney commented on NUTCH-2856: - I'll take this one on. I intend to use https

[jira] [Assigned] (NUTCH-2856) protocol-smb plugin is outdated

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2856: --- Assignee: Lewis John McGibbney > protocol-smb plugin is outda

[jira] [Commented] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implm

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466703#comment-17466703 ] Lewis John McGibbney commented on NUTCH-427: An old thread but I found an alternative SMB

Nutch metrics documentation request for review/feedback

2021-12-29 Thread lewis john mcgibbney
Hi dev@, *What?* I've been chipping away at some documentation which would provide a one-stop-shop for understanding Nutch metrics. My first pass is available at https://cwiki.apache.org/confluence/display/NUTCH/Metrics This relates to the recent JIRA issue I filed about establishing a Nutch

!! Join the #nutch Slack channel !!

2021-12-29 Thread lewis john mcgibbney
Hi user@, dev@, I took the liberty of setting up a #nutch channel for our community to communicate in a lower latency manner. First join the-asf.slack.com Slack workspace https://infra.apache.org/slack.html Then simply join the #nutch channel. See you there :) Thanks lewismc --

Re: Break out individual functions from IndexerJob -deleteGone flag?

2021-12-29 Thread Lewis John McGibbney
I also should note that the -deleteGone setting cannot be overriden via nutch-site.xml whereas similar settings do have equivalent configuration properties in nutch-default.xml https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1361-L1373 On 2021/12/29 17:08:20 lewis john

Break out individual functions from IndexerJob -deleteGone flag?

2021-12-29 Thread lewis john mcgibbney
Hi dev@, Reading the code for the IndexerJob -deleteGone flag [0] you can clearly see that we bundle deletion requests for 404s, redirects and duplicates into one option. This of course has pros and cons. Does anyone wish to share their opinion on how this is implemented? My opinion is that 1. The

[jira] [Created] (NUTCH-2920) Implement a indexer-opensearch plugin

2021-12-17 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2920: --- Summary: Implement a indexer-opensearch plugin Key: NUTCH-2920 URL: https://issues.apache.org/jira/browse/NUTCH-2920 Project: Nutch Issue Type

[jira] [Resolved] (NUTCH-2449) Usage of Tika LanguageIdentifier in language-identifier plugin

2021-12-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2449. - Resolution: Fixed > Usage of Tika LanguageIdentifier in language-identif

[jira] [Commented] (NUTCH-2278) Handle alpha-2 language codes consistently

2021-12-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461637#comment-17461637 ] Lewis John McGibbney commented on NUTCH-2278: - Out of curiosity [~Fengtan] are you still

[jira] [Comment Edited] (NUTCH-2278) Handle alpha-2 language codes consistently

2021-12-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461635#comment-17461635 ] Lewis John McGibbney edited comment on NUTCH-2278 at 12/17/21, 7:48 PM

[jira] [Commented] (NUTCH-2278) Handle alpha-2 language codes consistently

2021-12-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461635#comment-17461635 ] Lewis John McGibbney commented on NUTCH-2278: - [~snagel] wdyt about this? > Handle alph

[jira] [Commented] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-17 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461620#comment-17461620 ] Lewis John McGibbney commented on NUTCH-2919: - The artifacts have not yet made maven central

[jira] [Created] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2919: --- Summary: Upgrade to Tika 2.2.0 Key: NUTCH-2919 URL: https://issues.apache.org/jira/browse/NUTCH-2919 Project: Nutch Issue Type: Improvement

[jira] [Work started] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-16 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2919 started by Lewis John McGibbney. --- > Upgrade to Tika 2.

<    1   2   3   4   5   6   7   8   9   10   >