[ANNOUNCE] Apache Nutch 1.20 Release

2024-04-30 Thread lewis john mcgibbney
The Apache Nutch Project Management Committee is pleased to announce
the release of Apache Nutch v1.20. We strongly encourage users to
upgrade to this release.

Nutch is a well matured, production ready Web crawler. Nutch 1.x
enables fine grained configuration, relying on Apache Hadoop™ data
structures. Source and binary distributions are available for download
from the Apache Nutch download site:
https://nutch.apache.org/download/

Please verify signatures using the KEYS file
https://raw.githubusercontent.com/apache/nutch/master/KEYS when
downloading the release.

This release includes more than 60 bug fixes and improvements, the
full list of changes can be seen in the Jira release report
https://s.apache.org/ovjf3

Thanks to everyone who contributed to this release!

lewismc


[jira] [Closed] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions

2024-04-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3054.
---

> Address deprecation of Node16 for all GitHub Actions
> 
>
> Key: NUTCH-3054
> URL: https://issues.apache.org/jira/browse/NUTCH-3054
> Project: Nutch
>  Issue Type: Task
>  Components: ci/cd
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> See 
> [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/]
> We need to upgrade the setup-java action in  
> [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
>  
> Patch coming up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions

2024-04-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3054.
-
Resolution: Fixed

> Address deprecation of Node16 for all GitHub Actions
> 
>
> Key: NUTCH-3054
> URL: https://issues.apache.org/jira/browse/NUTCH-3054
> Project: Nutch
>  Issue Type: Task
>  Components: ci/cd
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> See 
> [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/]
> We need to upgrade the setup-java action in  
> [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
>  
> Patch coming up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions

2024-04-29 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3054:

Affects Version/s: 1.20

> Address deprecation of Node16 for all GitHub Actions
> 
>
> Key: NUTCH-3054
> URL: https://issues.apache.org/jira/browse/NUTCH-3054
> Project: Nutch
>  Issue Type: Task
>  Components: ci/cd
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> See 
> [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/]
> We need to upgrade the setup-java action in  
> [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
>  
> Patch coming up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3054:
---

 Summary: Address deprecation of Node16 for all GitHub Actions
 Key: NUTCH-3054
 URL: https://issues.apache.org/jira/browse/NUTCH-3054
 Project: Nutch
  Issue Type: Task
  Components: ci/cd
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.21


See 
[https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/]

We need to upgrade the setup-java action in  
[https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
 

Patch coming up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (NUTCH-3054) Address deprecation of Node16 for all GitHub Actions

2024-04-29 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3054 started by Lewis John McGibbney.
---
> Address deprecation of Node16 for all GitHub Actions
> 
>
> Key: NUTCH-3054
> URL: https://issues.apache.org/jira/browse/NUTCH-3054
> Project: Nutch
>  Issue Type: Task
>  Components: ci/cd
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> See 
> [https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/]
> We need to upgrade the setup-java action in  
> [https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
>  
> Patch coming up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3049) Investigate using Records

2024-04-29 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842208#comment-17842208
 ] 

Lewis John McGibbney commented on NUTCH-3049:
-

I think that each of the Writable classes mentioned in NutchWritable may be 
fair game

{{        org.apache.nutch.crawl.CrawlDatum.class,}}
{{        org.apache.nutch.crawl.Inlink.class,}}
{{        org.apache.nutch.crawl.Inlinks.class,}}
{{        org.apache.nutch.indexer.NutchIndexAction.class,}}
{{        org.apache.nutch.metadata.Metadata.class,}}
{{        org.apache.nutch.parse.Outlink.class,}}
{{        org.apache.nutch.parse.ParseText.class,}}
{{        org.apache.nutch.parse.ParseData.class,}}
{{        org.apache.nutch.parse.ParseImpl.class,}}
{{        org.apache.nutch.parse.ParseStatus.class,}}
{{        org.apache.nutch.protocol.Content.class,}}
{{        org.apache.nutch.protocol.ProtocolStatus.class,}}
{{        org.apache.nutch.scoring.webgraph.LinkDatum.class,}}
{{        org.apache.nutch.hostdb.HostDatum.class}}

> Investigate using Records
> -
>
> Key: NUTCH-3049
> URL: https://issues.apache.org/jira/browse/NUTCH-3049
> Project: Nutch
>  Issue Type: Sub-task
>        Reporter: Lewis John McGibbney
>Priority: Major
>
> Guidance at [https://www.baeldung.com/java-migrate-8-to-17#records]
> i think there are multiple areas where we could use Records. This ticket will 
> document the opportunities and structure that work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Consolidating Nutch Continuous Integration

2024-04-29 Thread Lewis John McGibbney
Hi Sebastian,
Understood. If it ain’t broke don’t fix it.
Thanks for the input.

On 2024/04/28 12:08:27 Sebastian Nagel wrote:
> 
>  From my side: no. It may not harm to have both.
> 
> Best,
> Sebastian


[jira] [Created] (NUTCH-3053) Upgrade build and CI to JDK17

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3053:
---

 Summary: Upgrade build and CI to JDK17
 Key: NUTCH-3053
 URL: https://issues.apache.org/jira/browse/NUTCH-3053
 Project: Nutch
  Issue Type: Sub-task
  Components: build, ci/cd
Reporter: Lewis John McGibbney


This will involves changes to
 * 
[https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml]
 * [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/]
 * [https://github.com/apache/nutch/blob/master/default.properties#L46]
 * [https://github.com/apache/nutch/blob/master/default.properties#L57]
 * We should also investigate any deprecation notices in the build output
 * [https://github.com/apache/nutch/blob/master/ivy/mvn.template#L128-L129]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3052) Investigate using sealed classes

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3052:
---

 Summary: Investigate using sealed classes
 Key: NUTCH-3052
 URL: https://issues.apache.org/jira/browse/NUTCH-3052
 Project: Nutch
  Issue Type: Sub-task
Reporter: Lewis John McGibbney


Guidance available at 
[https://www.baeldung.com/java-migrate-8-to-17#sealed-classes]

First document if and where sealed classes would add value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3051) Investigate using new pattern matching syntax in switch expressions

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3051:
---

 Summary: Investigate using new pattern matching syntax in switch 
expressions
 Key: NUTCH-3051
 URL: https://issues.apache.org/jira/browse/NUTCH-3051
 Project: Nutch
  Issue Type: Sub-task
Reporter: Lewis John McGibbney


Guidance available at 
[https://www.baeldung.com/java-migrate-8-to-17#2-switch-expressions]

Apparently we use switch in 35 files

[https://github.com/search?q=repo%3Aapache%2Fnutch+switch+language%3AJava=code=Java]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3050) Investigate use of the enhanced instanceof operator

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3050:
---

 Summary: Investigate use of the enhanced instanceof operator
 Key: NUTCH-3050
 URL: https://issues.apache.org/jira/browse/NUTCH-3050
 Project: Nutch
  Issue Type: Sub-task
Reporter: Lewis John McGibbney


Guidance at 
[https://www.baeldung.com/java-migrate-8-to-17#1-enhanced-instanceof-operator]

Apparently we use instanceof operator in 50 files

[https://github.com/search?q=repo%3Aapache%2Fnutch%20instanceof=code]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3049) Investigate using Records

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3049:
---

 Summary: Investigate using Records
 Key: NUTCH-3049
 URL: https://issues.apache.org/jira/browse/NUTCH-3049
 Project: Nutch
  Issue Type: Sub-task
Reporter: Lewis John McGibbney


Guidance at [https://www.baeldung.com/java-migrate-8-to-17#records]

i think there are multiple areas where we could use Records. This ticket will 
document the opportunities and structure that work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3048) Investigate where/if new string utility methods could be used

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3048:
---

 Summary: Investigate where/if new string utility methods could be 
used
 Key: NUTCH-3048
 URL: https://issues.apache.org/jira/browse/NUTCH-3048
 Project: Nutch
  Issue Type: Sub-task
  Components: util
Reporter: Lewis John McGibbney


Guidance at [https://www.baeldung.com/java-migrate-8-to-17#3-new-string-methods]

We may be able to also revisit our usage of common-* libraries with tje goal of 
using native methods from JDK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3047) Use multi-line text blocks

2024-04-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3047:
---

 Summary: Use multi-line text blocks
 Key: NUTCH-3047
 URL: https://issues.apache.org/jira/browse/NUTCH-3047
 Project: Nutch
  Issue Type: Sub-task
  Components: CLI
Reporter: Lewis John McGibbney


Guidance available at 
[https://www.baeldung.com/java-migrate-8-to-17#2-text-block]

This will help to cleanup our CLI *usage()* messages at a bare minimum.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3046) Use compact strings

2024-04-29 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3046:

Description: 
Follow the guidance at 
[https://www.baeldung.com/java-migrate-8-to-17#1-compact-string]

It looks like there are 9 instances where we use _*char []*_

|[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]].

  was:
Follow the guidance at 
[https://www.baeldung.com/java-migrate-8-to-17#1-compact-string]

It looks like there are [9 instances where we use 
char[]|[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]].


> Use compact strings
> ---
>
> Key: NUTCH-3046
> URL: https://issues.apache.org/jira/browse/NUTCH-3046
> Project: Nutch
>  Issue Type: Sub-task
>        Reporter: Lewis John McGibbney
>Priority: Major
>
> Follow the guidance at 
> [https://www.baeldung.com/java-migrate-8-to-17#1-compact-string]
> It looks like there are 9 instances where we use _*char []*_
> |[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3046) Use compact strings

2024-04-28 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3046:
---

 Summary: Use compact strings
 Key: NUTCH-3046
 URL: https://issues.apache.org/jira/browse/NUTCH-3046
 Project: Nutch
  Issue Type: Sub-task
Reporter: Lewis John McGibbney


Follow the guidance at 
[https://www.baeldung.com/java-migrate-8-to-17#1-compact-string]

It looks like there are [9 instances where we use 
char[]|[https://github.com/search?q=repo%3Aapache%2Fnutch%20char%5B%5D=code]].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3045) Upgrade from Java 11 to 17

2024-04-28 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3045:
---

 Summary: Upgrade from Java 11 to 17
 Key: NUTCH-3045
 URL: https://issues.apache.org/jira/browse/NUTCH-3045
 Project: Nutch
  Issue Type: Task
  Components: build, ci/cd
Reporter: Lewis John McGibbney
 Fix For: 1.21


This parent issue will track and organize work pertaining to upgrading Nutch to 
JDK 17.

Premier support for Oracle JDK 11 ended 7 months ago (30 Sep 2023).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] Apache Nutch 1.20 Release

2024-04-28 Thread lewis john mcgibbney
The Apache Nutch Project https://nutch.apache.org/download/

Please verify signatures using the KEYS file
https://raw.githubusercontent.com/apache/nutch/master/KEYS when downloading
the release.

This release includes more than 60 bug fixes and improvements, the full
list of changes can be seen in the Jira release report
https://s.apache.org/ovjf3

Thanks to everyone who contributed to this release!

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Nutch 1.20 Release

2024-04-28 Thread lewis john mcgibbney
The Apache Nutch Project https://nutch.apache.org/download/

Please verify signatures using the KEYS file
https://raw.githubusercontent.com/apache/nutch/master/KEYS when downloading
the release.

This release includes more than 60 bug fixes and improvements, the full
list of changes can be seen in the Jira release report
https://s.apache.org/ovjf3

Thanks to everyone who contributed to this release!

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [DISCUSS] Consolidating Nutch Continuous Integration

2024-04-25 Thread Lewis John McGibbney
A better reference for the GitHub Actions can be found at 
https://github.com/apache/nutch/actions

lewismc

On 2024/04/25 14:40:35 lewis john mcgibbney wrote:
> Hi dev@,
> 
> We currently maintains a combination of Jenkins [0] and GitHub Actions [1]
> for CI.
> 
> For the longest time, we relied solely on Jenkins. This was really useful
> particularly when committers were pulling build artifacts from Jenkins
> nightly and relied on SVN trunk being stable. The Jenkins job used to be
> run nightly but no longer is. It is not clear exactly when nightly SNAPSHOT
> builds were turned off.
> 
> In 2020 we accepted a pull request [2] which established GitHub Actions and
> since then have gradually added small but important updates to the GitHub
> Actions workflow [3].
> 
> I can elaborate on the details of what each CI workflow does (it is not
> overly complex) but before I do that, is there any preference on choosing
> one (Jenkins Vs GitHub Actions) over the other?
> 
> Thanks
> 
> lewismc
> 
> [0] https://ci-builds.apache.org/job/Nutch/
> [1]
> https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml
> [2]
> https://github.com/apache/nutch/commit/e33aaa14739c7c02f4121ac1d8d0e7860f329e06
> [3]
> https://github.com/apache/nutch/commits/master/.github/workflows/master-build.yml
> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
> 


[DISCUSS] Consolidating Nutch Continuous Integration

2024-04-25 Thread lewis john mcgibbney
Hi dev@,

We currently maintains a combination of Jenkins [0] and GitHub Actions [1]
for CI.

For the longest time, we relied solely on Jenkins. This was really useful
particularly when committers were pulling build artifacts from Jenkins
nightly and relied on SVN trunk being stable. The Jenkins job used to be
run nightly but no longer is. It is not clear exactly when nightly SNAPSHOT
builds were turned off.

In 2020 we accepted a pull request [2] which established GitHub Actions and
since then have gradually added small but important updates to the GitHub
Actions workflow [3].

I can elaborate on the details of what each CI workflow does (it is not
overly complex) but before I do that, is there any preference on choosing
one (Jenkins Vs GitHub Actions) over the other?

Thanks

lewismc

[0] https://ci-builds.apache.org/job/Nutch/
[1]
https://github.com/apache/nutch/blob/master/.github/workflows/master-build.yml
[2]
https://github.com/apache/nutch/commit/e33aaa14739c7c02f4121ac1d8d0e7860f329e06
[3]
https://github.com/apache/nutch/commits/master/.github/workflows/master-build.yml

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Apache Nutch 1.20 Release

2024-04-24 Thread lewis john mcgibbney
Hi user@ & dev@,
I’m glad to conclude the Nutch 1.20 release candidate VOTE thread with the
following RESULT’s.

[5] +1 Release this package as Apache Nutch 1.20
snagel*
balakuntala*
blackice*
Joe Gilvary
lewismc*

[ ] -1 Do not release this package because…

*Nutch Project Management Committee-binding

The Nutch 1.20 release candidate has passed the community VOTE. I will
therefore promote this release casndidate.

Thanks for VOTE’ing and for everyone who contributed to the Apache Nutch
1.20 release.

lewismc

On Tue, Apr 9, 2024 at 2:28 PM lewis john mcgibbney 
wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.20 release is available at [0] where
> accompanying SHA512 and ASC signatures can also be found.
> Information on verifying releases can be found at [1].
>
> The release candidate comprises a .zip and tar.gz archive of the sources
> at [2] and complementary binary distributions. In addition, a staged maven
> repository is available at [3].
>
> The Nutch 1.20 release report is available at [4].
>
> Please vote on releasing this package as Apache Nutch 1.20. The vote is
> open for at least the next 72 hours and passes if a majority of at least
> three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.20.
>
> [ ] -1 Do not release this package because…
>
> Cheers,
> lewismc
> P.S. Here is my +1.
>
> [0] https://dist.apache.org/repos/dist/dev/nutch/1.20
> [1] http://nutch.apache.org/downloads.html#verify
> [2] https://github.com/apache/nutch/tree/release-1.20
> [3]
> https://repository.apache.org/content/repositories/orgapachenutch-1021/
> [4] https://s.apache.org/ovjf3
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Apache Nutch 1.20 Release

2024-04-24 Thread lewis john mcgibbney
Hi user@ & dev@,
I’m glad to conclude the Nutch 1.20 release candidate VOTE thread with the
following RESULT’s.

[5] +1 Release this package as Apache Nutch 1.20
snagel*
balakuntala*
blackice*
Joe Gilvary
lewismc*

[ ] -1 Do not release this package because…

*Nutch Project Management Committee-binding

The Nutch 1.20 release candidate has passed the community VOTE. I will
therefore promote this release casndidate.

Thanks for VOTE’ing and for everyone who contributed to the Apache Nutch
1.20 release.

lewismc

On Tue, Apr 9, 2024 at 2:28 PM lewis john mcgibbney 
wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.20 release is available at [0] where
> accompanying SHA512 and ASC signatures can also be found.
> Information on verifying releases can be found at [1].
>
> The release candidate comprises a .zip and tar.gz archive of the sources
> at [2] and complementary binary distributions. In addition, a staged maven
> repository is available at [3].
>
> The Nutch 1.20 release report is available at [4].
>
> Please vote on releasing this package as Apache Nutch 1.20. The vote is
> open for at least the next 72 hours and passes if a majority of at least
> three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.20.
>
> [ ] -1 Do not release this package because…
>
> Cheers,
> lewismc
> P.S. Here is my +1.
>
> [0] https://dist.apache.org/repos/dist/dev/nutch/1.20
> [1] http://nutch.apache.org/downloads.html#verify
> [2] https://github.com/apache/nutch/tree/release-1.20
> [3]
> https://repository.apache.org/content/repositories/orgapachenutch-1021/
> [4] https://s.apache.org/ovjf3
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Help posting question

2024-04-24 Thread Lewis John McGibbney
Hi Sheham,

On 2024/04/20 08:47:41 Sheham Izat wrote:

> The Fetcher job was aborted, does that still mean that it went through the
> entire list of seed urls?

Yes it processed the entire generated segment but the fetcher…

* hung on https://disneyland.disney.go.com/, https://api.onlyoffice.com/,  
https://www.adu.com/ and https://www.lowes.com/
* was denied by robots.txt for https://sourceforge.net/, 
https://onsclothing.com/, https://kinto-usa.com/, https://twitter.com/, 
https://www.linkedin.com/, etc.
* encountered problems processing some robots.txt files for 
https://twitter.com/, https://www.trustradius.com/
There may be some other issues encountered buy the fetcher. 

This is not at all uncommon. The fetcher completed successfully after 7 
seconds. You could progress with your crawl.

> 
> I will go through the mailing list questions.

If you need more assistance please let us know. You will find plenty of 
pointers on this mailing list archive though.

lewismc


[jira] [Updated] (NUTCH-3042) Use GitHub cache action to improve CI execution time

2024-04-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3042:

Description: 
With the Ant+Ivy build architecture, the current GitHub actions workflow can 
and regularly does take over 20 minutes to complete. Dependency retrieval takes 
a significant amount of time.

I think we can address the above issue and dramatically reduce the CI runtime 
by utilizing the official [GitHiub cache 
action|[https://github.com/actions/cache]].

It appears however that the action does not support the Apache Ivy cache. Both 
Maven and Gradle are supported. I [created a 
discussion|[https://github.com/actions/cache/discussions/1381]] to get 
conformation.

In the case that we cannot implement a cache for the Ivy build system then we 
will need to come back to this issue once we migrate to Gradle.

  was:
With the Ant+Ivy build architecture, the current GitHub actions workflow can 
and regularly does take over 20 minutes to complete. Dependency retrieval takes 
a significant amount of time.

I think we can address the above issue and dramatically reduce the CI runtime 
by utilizing the official [GitHiub cache 
action|[https://github.com/actions/cache]].

It appears however that the action does not support the Apache Ivy cache. Both 
Maven and Gradle are supported. I created a discussion to get conformation if 
this is the case.

In the case that we cannot implement a cache for the Ivy build system then we 
will need to come back to this issue once we migrate to Gradle.


> Use GitHub cache action to improve CI execution time
> 
>
> Key: NUTCH-3042
> URL: https://issues.apache.org/jira/browse/NUTCH-3042
> Project: Nutch
>  Issue Type: Task
>  Components: ci/cd
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.21
>
>
> With the Ant+Ivy build architecture, the current GitHub actions workflow can 
> and regularly does take over 20 minutes to complete. Dependency retrieval 
> takes a significant amount of time.
> I think we can address the above issue and dramatically reduce the CI runtime 
> by utilizing the official [GitHiub cache 
> action|[https://github.com/actions/cache]].
> It appears however that the action does not support the Apache Ivy cache. 
> Both Maven and Gradle are supported. I [created a 
> discussion|[https://github.com/actions/cache/discussions/1381]] to get 
> conformation.
> In the case that we cannot implement a cache for the Ivy build system then we 
> will need to come back to this issue once we migrate to Gradle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3042) Use GitHub cache action to improve CI execution time

2024-04-19 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3042:
---

 Summary: Use GitHub cache action to improve CI execution time
 Key: NUTCH-3042
 URL: https://issues.apache.org/jira/browse/NUTCH-3042
 Project: Nutch
  Issue Type: Task
  Components: ci/cd
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.21


With the Ant+Ivy build architecture, the current GitHub actions workflow can 
and regularly does take over 20 minutes to complete. Dependency retrieval takes 
a significant amount of time.

I think we can address the above issue and dramatically reduce the CI runtime 
by utilizing the official [GitHiub cache 
action|[https://github.com/actions/cache]].

It appears however that the action does not support the Apache Ivy cache. Both 
Maven and Gradle are supported. I created a discussion to get conformation if 
this is the case.

In the case that we cannot implement a cache for the Ivy build system then we 
will need to come back to this issue once we migrate to Gradle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3041 started by Lewis John McGibbney.
---
> Address confusing logging in o.a.n.net.URLExemptionFilters 
> ---
>
> Key: NUTCH-3041
> URL: https://issues.apache.org/jira/browse/NUTCH-3041
> Project: Nutch
>  Issue Type: Task
>  Components: net
>Affects Versions: 1.19, 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> URLExemptionFilter impementations are used to allow exemptions to external 
> domain resources by overriding the {{db.ignore.external.links}} configuration 
> setting. This is useful when the crawl is focused to a domain but resources 
> like images are hosted on CDN.
> Currently [URLExemptionFilters|#L47-L48]] provides the following logging
> {quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
> #0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
> {quote}
> I find this confusing. It would be better to log *only* if an 
> URLExemptionFilter implementation is actually configured to be used at 
> runtime.
> I will provide a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3041:

Description: 
URLExemptionFilter impementations are used to allow exemptions to external 
domain resources by overriding the {{db.ignore.external.links}} configuration 
setting. This is useful when the crawl is focused to a domain but resources 
like images are hosted on CDN.

Currently [URLExemptionFilters|#L47-L48]] provides the following logging
{quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
#0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
{quote}
I find this confusing. It would be better to log *only* if an 
URLExemptionFilter implementation is actually configured to be used at runtime.

I will provide a patch for this.

  was:
URLExemptionFilter impementations are used to allow exemptions to external 
domain resources by overriding the {{db.ignore.external.links}} configuration 
setting. This is useful when the crawl is focused to a domain but resources 
like images are hosted on CDN.

Currently [URLExemptionFilters|#L47-L48]] provides the following logging
{quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
#0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
{quote}
I find this confusing. It would be better to log *only* if an 
URLExemptionFilter implementation actually exists for a given URL.

I will provide a patch for this.


> Address confusing logging in o.a.n.net.URLExemptionFilters 
> ---
>
> Key: NUTCH-3041
> URL: https://issues.apache.org/jira/browse/NUTCH-3041
> Project: Nutch
>  Issue Type: Task
>  Components: net
>Affects Versions: 1.19, 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> URLExemptionFilter impementations are used to allow exemptions to external 
> domain resources by overriding the {{db.ignore.external.links}} configuration 
> setting. This is useful when the crawl is focused to a domain but resources 
> like images are hosted on CDN.
> Currently [URLExemptionFilters|#L47-L48]] provides the following logging
> {quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
> #0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
> {quote}
> I find this confusing. It would be better to log *only* if an 
> URLExemptionFilter implementation is actually configured to be used at 
> runtime.
> I will provide a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3041:

Description: 
URLExemptionFilter impementations are used to allow exemptions to external 
domain resources by overriding the {{db.ignore.external.links}} configuration 
setting. This is useful when the crawl is focused to a domain but resources 
like images are hosted on CDN.

Currently [URLExemptionFilters|#L47-L48]] provides the following logging
{quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
#0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
{quote}
I find this confusing. It would be better to log *only* if an 
URLExemptionFilter implementation actually exists for a given URL.

I will provide a patch for this.

  was:
URLExemptionFilter impementations are used to allow exemptions to external 
domain resources by overriding the {{db.ignore.external.links}} configuration 
setting. This is useful when the crawl is focused to a domain but resources 
like images are hosted on CDN.

Currently 
[URLExemptionFilters|[https://github.com/apache/nutch/blob/271f92e11c39b7a3583cfcd8d664262cfac59674/src/java/org/apache/nutch/net/URLExemptionFilters.java#L47-L48]]
 provides some confusing INFO-level logging
{quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] 
Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
{quote}
I find this confusing. It would be better to log *only* if an 
URLExemptionFilter implementation actually exists for a given URL.

I will provide a patch for this.


> Address confusing logging in o.a.n.net.URLExemptionFilters 
> ---
>
> Key: NUTCH-3041
> URL: https://issues.apache.org/jira/browse/NUTCH-3041
> Project: Nutch
>  Issue Type: Task
>  Components: net
>Affects Versions: 1.19, 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.21
>
>
> URLExemptionFilter impementations are used to allow exemptions to external 
> domain resources by overriding the {{db.ignore.external.links}} configuration 
> setting. This is useful when the crawl is focused to a domain but resources 
> like images are hosted on CDN.
> Currently [URLExemptionFilters|#L47-L48]] provides the following logging
> {quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor 
> #0|#0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
> {quote}
> I find this confusing. It would be better to log *only* if an 
> URLExemptionFilter implementation actually exists for a given URL.
> I will provide a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Help posting question

2024-04-19 Thread Lewis John McGibbney
Hi Sheham,

On 2024/04/19 15:18:01 Sheham Izat wrote:
> 
> My questions are:
> 
> 1) What do I need to do to get Nutch to continue working even if there are
> hung threads?

>From what I can see in the log you provided, nothing is preventing Nutch from 
>continuing to work. The Fetcher job finished successfully.

> 2) Is there a way to avoid having these hanging threads in the first place?

Several factors can lead to hung fetcher threads. Lots of questions have been 
asked on this mailing list relating to exactly this issue. I would encourage 
you to study some of the community responses and see if they assist you in a 
better understanding of the possible issues. You can filter questions in the 
mailing list search with the following criteria
* date range: more than 1 days ago
* body: hung

https://lists.apache.org/list.html?user@nutch.apache.org


[jira] [Created] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3041:
---

 Summary: Address confusing logging in 
o.a.n.net.URLExemptionFilters 
 Key: NUTCH-3041
 URL: https://issues.apache.org/jira/browse/NUTCH-3041
 Project: Nutch
  Issue Type: Task
  Components: net
Affects Versions: 1.19, 1.20
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.21


URLExemptionFilter impementations are used to allow exemptions to external 
domain resources by overriding the {{db.ignore.external.links}} configuration 
setting. This is useful when the crawl is focused to a domain but resources 
like images are hosted on CDN.

Currently 
[URLExemptionFilters|[https://github.com/apache/nutch/blob/271f92e11c39b7a3583cfcd8d664262cfac59674/src/java/org/apache/nutch/net/URLExemptionFilters.java#L47-L48]]
 provides some confusing INFO-level logging
{quote}INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] 
Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
{quote}
I find this confusing. It would be better to log *only* if an 
URLExemptionFilter implementation actually exists for a given URL.

I will provide a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed COMDEV-544.
---

> Improve comdev website navigation to GSoC mentor resources
> --
>
> Key: COMDEV-544
> URL: https://issues.apache.org/jira/browse/COMDEV-544
> Project: Community Development
>  Issue Type: Task
>  Components: Website
>    Reporter: Lewis John McGibbney
>Priority: Minor
>
> h1. Purpose
> Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
> resources.
> h1. Context
> Having been ‘away’ for a few years, this year I decided to make an attempt to 
> re-engage with the GSoC program.
> I quickly realized that I was totally out of touch having absolutely no idea 
> where the mentor community conversations were happening (they happen on 
> ment...@community.apache.org) and being hopelessly unable to locate GSoC 
> mentoring documentation via the comdev website. 
> Thankfully [~sanyam] [pointed me at the 
> documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
>  and I was able to get back up to speed. Thank you Sanyam :)
> h1. Challenges
> Looking at [https://community.apache.org/gsoc/], as of writing, although 
> loads of content exists for students (which is great) no navigation exists to 
> mentor resources. 
> In my case, this meant that I couldn’t find and entirely missed the excellent 
> content available at 
> [https://community.apache.org/gsoc/guide-to-being-a-mentor.html].
> h1. Proposal
> I think that a “{*}Mentors: read this{*}” Section should be added to 
> [https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
> content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Resolved] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved COMDEV-544.
-
Resolution: Fixed

Thanks [~rbowen] for merging.

> Improve comdev website navigation to GSoC mentor resources
> --
>
> Key: COMDEV-544
> URL: https://issues.apache.org/jira/browse/COMDEV-544
> Project: Community Development
>  Issue Type: Task
>  Components: Website
>    Reporter: Lewis John McGibbney
>Priority: Minor
>
> h1. Purpose
> Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
> resources.
> h1. Context
> Having been ‘away’ for a few years, this year I decided to make an attempt to 
> re-engage with the GSoC program.
> I quickly realized that I was totally out of touch having absolutely no idea 
> where the mentor community conversations were happening (they happen on 
> ment...@community.apache.org) and being hopelessly unable to locate GSoC 
> mentoring documentation via the comdev website. 
> Thankfully [~sanyam] [pointed me at the 
> documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
>  and I was able to get back up to speed. Thank you Sanyam :)
> h1. Challenges
> Looking at [https://community.apache.org/gsoc/], as of writing, although 
> loads of content exists for students (which is great) no navigation exists to 
> mentor resources. 
> In my case, this meant that I couldn’t find and entirely missed the excellent 
> content available at 
> [https://community.apache.org/gsoc/guide-to-being-a-mentor.html].
> h1. Proposal
> I think that a “{*}Mentors: read this{*}” Section should be added to 
> [https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
> content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Commented] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/COMDEV-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838694#comment-17838694
 ] 

Lewis John McGibbney commented on COMDEV-544:
-

[~sebb] thank you, I was on a mob ile device and actually missed the top 
navigation. Thank you

> Improve comdev website navigation to GSoC mentor resources
> --
>
> Key: COMDEV-544
> URL: https://issues.apache.org/jira/browse/COMDEV-544
> Project: Community Development
>  Issue Type: Task
>  Components: Website
>    Reporter: Lewis John McGibbney
>Priority: Minor
>
> h1. Purpose
> Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
> resources.
> h1. Context
> Having been ‘away’ for a few years, this year I decided to make an attempt to 
> re-engage with the GSoC program.
> I quickly realized that I was totally out of touch having absolutely no idea 
> where the mentor community conversations were happening (they happen on 
> ment...@community.apache.org) and being hopelessly unable to locate GSoC 
> mentoring documentation via the comdev website. 
> Thankfully [~sanyam] [pointed me at the 
> documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
>  and I was able to get back up to speed. Thank you Sanyam :)
> h1. Challenges
> Looking at [https://community.apache.org/gsoc/], as of writing, although 
> loads of content exists for students (which is great) no navigation exists to 
> mentor resources. 
> In my case, this meant that I couldn’t find and entirely missed the excellent 
> content available at 
> [https://community.apache.org/gsoc/guide-to-being-a-mentor.html].
> h1. Proposal
> I think that a “{*}Mentors: read this{*}” Section should be added to 
> [https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
> content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Updated] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/COMDEV-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated COMDEV-544:

Description: 
h1. Purpose

Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
resources.
h1. Context

Having been ‘away’ for a few years, this year I decided to make an attempt to 
re-engage with the GSoC program.

I quickly realized that I was totally out of touch having absolutely no idea 
where the mentor community conversations were happening (they happen on 
ment...@community.apache.org) and being hopelessly unable to locate GSoC 
mentoring documentation via the comdev website. 

Thankfully [~sanyam] [pointed me at the 
documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
 and I was able to get back up to speed. Thank you Sanyam :)
h1. Challenges

Looking at [https://community.apache.org/gsoc/], as of writing, although loads 
of content exists for students (which is great) no navigation exists to mentor 
resources. 

In my case, this meant that I couldn’t find and entirely missed the excellent 
content available at 
[https://community.apache.org/gsoc/guide-to-being-a-mentor.html].
h1. Proposal

I think that a “{*}Mentors: read this{*}” Section should be added to 
[https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
content from above. 

  was:
h1. Purpose

Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
resources.
h1. Context

Having been ‘away’ for a few years, this year I decided to make an attempt to 
re-engage with the GSoC program.

I quickly realized that I was totally out of touch having absolutely no idea 
where the mentor community conversations were happening (they happen on 
ment...@community.apache.org) and being hopelessly unable to locate GSoC 
mentoring documentation via the comdev website. 

Thankfully [~sanyam] [pointed me at the 
documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
 and I was able to get back up to speed. Thank you Sanyam :)
h1. Challenges

Looking at [https://community.apache.org/gsoc/], as of writing, although loads 
of content exists for students (which is great) no navigation exists to mentor 
resources. 

In my case, this meant that I couldn’t find and entirely missed the excellent 
content available at [https://community.apache.org/mentoring]/.
h1. Proposal

I think that a “{*}Mentors: read this{*}” Section should be added to 
[https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
content from above. 


> Improve comdev website navigation to GSoC mentor resources
> --
>
> Key: COMDEV-544
> URL: https://issues.apache.org/jira/browse/COMDEV-544
> Project: Community Development
>  Issue Type: Task
>  Components: Website
>    Reporter: Lewis John McGibbney
>Priority: Minor
>
> h1. Purpose
> Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
> resources.
> h1. Context
> Having been ‘away’ for a few years, this year I decided to make an attempt to 
> re-engage with the GSoC program.
> I quickly realized that I was totally out of touch having absolutely no idea 
> where the mentor community conversations were happening (they happen on 
> ment...@community.apache.org) and being hopelessly unable to locate GSoC 
> mentoring documentation via the comdev website. 
> Thankfully [~sanyam] [pointed me at the 
> documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
>  and I was able to get back up to speed. Thank you Sanyam :)
> h1. Challenges
> Looking at [https://community.apache.org/gsoc/], as of writing, although 
> loads of content exists for students (which is great) no navigation exists to 
> mentor resources. 
> In my case, this meant that I couldn’t find and entirely missed the excellent 
> content available at 
> [https://community.apache.org/gsoc/guide-to-being-a-mentor.html].
> h1. Proposal
> I think that a “{*}Mentors: read this{*}” Section should be added to 
> [https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
> content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Commented] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/COMDEV-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838692#comment-17838692
 ] 

Lewis John McGibbney commented on COMDEV-544:
-

Thank you both.

> Improve comdev website navigation to GSoC mentor resources
> --
>
> Key: COMDEV-544
> URL: https://issues.apache.org/jira/browse/COMDEV-544
> Project: Community Development
>  Issue Type: Task
>  Components: Website
>    Reporter: Lewis John McGibbney
>Priority: Minor
>
> h1. Purpose
> Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
> resources.
> h1. Context
> Having been ‘away’ for a few years, this year I decided to make an attempt to 
> re-engage with the GSoC program.
> I quickly realized that I was totally out of touch having absolutely no idea 
> where the mentor community conversations were happening (they happen on 
> ment...@community.apache.org) and being hopelessly unable to locate GSoC 
> mentoring documentation via the comdev website. 
> Thankfully [~sanyam] [pointed me at the 
> documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
>  and I was able to get back up to speed. Thank you Sanyam :)
> h1. Challenges
> Looking at [https://community.apache.org/gsoc/], as of writing, although 
> loads of content exists for students (which is great) no navigation exists to 
> mentor resources. 
> In my case, this meant that I couldn’t find and entirely missed the excellent 
> content available at [https://community.apache.org/mentoring]/.
> h1. Proposal
> I think that a “{*}Mentors: read this{*}” Section should be added to 
> [https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
> content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Created] (COMDEV-544) Improve comdev website navigation to GSoC mentor resources

2024-04-18 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created COMDEV-544:
---

 Summary: Improve comdev website navigation to GSoC mentor resources
 Key: COMDEV-544
 URL: https://issues.apache.org/jira/browse/COMDEV-544
 Project: Community Development
  Issue Type: Task
  Components: Website
Reporter: Lewis John McGibbney


h1. Purpose

Improve comdev website navigation to Google Summer of Code (GSoC) mentor 
resources.
h1. Context

Having been ‘away’ for a few years, this year I decided to make an attempt to 
re-engage with the GSoC program.

I quickly realized that I was totally out of touch having absolutely no idea 
where the mentor community conversations were happening (they happen on 
ment...@community.apache.org) and being hopelessly unable to locate GSoC 
mentoring documentation via the comdev website. 

Thankfully [~sanyam] [pointed me at the 
documentation|[https://lists.apache.org/thread/dqmrwzjogl3sdb2v8s36v8mxf5o1yqsj]]
 and I was able to get back up to speed. Thank you Sanyam :)
h1. Challenges

Looking at [https://community.apache.org/gsoc/], as of writing, although loads 
of content exists for students (which is great) no navigation exists to mentor 
resources. 

In my case, this meant that I couldn’t find and entirely missed the excellent 
content available at [https://community.apache.org/mentoring]/.
h1. Proposal

I think that a “{*}Mentors: read this{*}” Section should be added to 
[https://community.apache.org/gsoc/] which simply hyperlinks to the relevant 
content from above. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



Re: Where is GSoC communications taking place?

2024-04-18 Thread Lewis John Mcgibbney
Hi Sanyam,
Thank you so much.
I’m going to suggest some edits to the comdev website to improve navigation
to the hyperlinks you provided. They are not linked to from the GSoC page
so we can fix that now.
Thanks for all your efforts as Org Admin for GSoC this year.
lewismc

On Thu, Apr 18, 2024 at 04:44 Sanyam Goel  wrote:

> Hi Lewis
> all the communication is happening on  ment...@community.apache.org list
>
> Please subscribe to it,
> this list is specific for GSoC mentors only and all the communication
> there should not be available outside that list
> Also I advice you to go through the following links as below for more steps
> https://community.apache.org/mentoring/
> https://community.apache.org/gsoc/guide-to-being-a-mentor.html
>
> https://community.apache.org/gsoc/guide-to-being-a-mentor.html#staying-in-touch
>
> FYI: we haven't received any scoring on the proposals received for the
> Nutch project, I advise you to start a new communication thread on the
> mentors list, as the deadline is already passed and we can continue from
> there.
> you can find all the details/announcements in the mentor list
>
>
> Thanks,
> Sanyam Goel
>
> On Thu, Apr 18, 2024 at 4:07 AM lewis john mcgibbney 
> wrote:
>
>> Hi dev@,
>> Can someone please point me to the GSoC happenings? I’ve not heard
>> anything
>> since been approved on the Nutch mailing list
>> https://lists.apache.org/thread/tk8x6sf2mt1lt0v10j30djqjk6vwpgb2
>> Thanks in advance.
>> lewismc
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>


Where is GSoC communications taking place?

2024-04-17 Thread lewis john mcgibbney
Hi dev@,
Can someone please point me to the GSoC happenings? I’ve not heard anything
since been approved on the Nutch mailing list
https://lists.apache.org/thread/tk8x6sf2mt1lt0v10j30djqjk6vwpgb2
Thanks in advance.
lewismc


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Apache Nutch 1.20 Release

2024-04-16 Thread lewis john mcgibbney
Hi user@, dev@,
Please consider reviewing the Nutch 1.20 release candidate. This is a
critical prerequisite for us making releases of software at TheASF.
Thank you
lewismc

On Tue, Apr 9, 2024 at 2:28 PM lewis john mcgibbney 
wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.20 release is available at [0] where
> accompanying SHA512 and ASC signatures can also be found.
> Information on verifying releases can be found at [1].
>
> The release candidate comprises a .zip and tar.gz archive of the sources
> at [2] and complementary binary distributions. In addition, a staged maven
> repository is available at [3].
>
> The Nutch 1.20 release report is available at [4].
>
> Please vote on releasing this package as Apache Nutch 1.20. The vote is
> open for at least the next 72 hours and passes if a majority of at least
> three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch X.XX.
>
> [ ] -1 Do not release this package because…
>
> Cheers,
> lewismc
> P.S. Here is my +1.
>
> [0] https://dist.apache.org/repos/dist/dev/nutch/1.20
> [1] http://nutch.apache.org/downloads.html#verify
> [2] https://github.com/apache/nutch/tree/release-1.20
> [3]
> https://repository.apache.org/content/repositories/orgapachenutch-1021/
> [4] https://s.apache.org/ovjf3
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Apache Nutch 1.20 Release

2024-04-16 Thread lewis john mcgibbney
Hi user@, dev@,
Please consider reviewing the Nutch 1.20 release candidate. This is a
critical prerequisite for us making releases of software at TheASF.
Thank you
lewismc

On Tue, Apr 9, 2024 at 2:28 PM lewis john mcgibbney 
wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.20 release is available at [0] where
> accompanying SHA512 and ASC signatures can also be found.
> Information on verifying releases can be found at [1].
>
> The release candidate comprises a .zip and tar.gz archive of the sources
> at [2] and complementary binary distributions. In addition, a staged maven
> repository is available at [3].
>
> The Nutch 1.20 release report is available at [4].
>
> Please vote on releasing this package as Apache Nutch 1.20. The vote is
> open for at least the next 72 hours and passes if a majority of at least
> three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch X.XX.
>
> [ ] -1 Do not release this package because…
>
> Cheers,
> lewismc
> P.S. Here is my +1.
>
> [0] https://dist.apache.org/repos/dist/dev/nutch/1.20
> [1] http://nutch.apache.org/downloads.html#verify
> [2] https://github.com/apache/nutch/tree/release-1.20
> [3]
> https://repository.apache.org/content/repositories/orgapachenutch-1021/
> [4] https://s.apache.org/ovjf3
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Apache Nutch 1.20 Release

2024-04-11 Thread Lewis John McGibbney
Hi Seb,

On 2024/04/11 13:30:53 Sebastian Nagel wrote:
> 
> https://github.com/sebastian-nagel/nutch-test-single-node-cluster/

I think we should make this into an integration test suite and run it as part 
of CI. I’ve been meaning and wanting to do this for the __longest__ time…!

> 
> One note about the CHANGES.md: it's now a mixture of HTML and plain text.
> It does not use the potential of markdown, e.g. sections / headlines for
> the releases to make the change log navigable via a table of contents.
> The embedded HTML makes it less readable if viewed in a text editor.
> The rendering on Github [5] is acceptable with only minor glitches,
> mostly the placement of multiple lines in a single paragraph:
>https://github.com/apache/nutch/blob/branch-1.20/CHANGES.md
> We also have a change log on Jira:
>https://s.apache.org/ovjf3
> That's why I wouldn't call the CHANGES.md a "blocker". We should
> update the formatting after the release to make it again easily
> readable in source code and improve the document structure utilizing
> the markdown markup.

Excellent suggestion. I was focusing on including the hyperlinks and clearly 
compromised other change log benefits. I will address this after the release. 
Thank you


Re: Mentor request for lewismc

2024-04-09 Thread Lewis John McGibbney
Please resend Sanyam I am not in receipt of the invitation yet.
Thank you
lewismc

On 2024/04/07 21:28:05 Sanyam Goel wrote:
> Hi
> 
> Invitation Sent,
> 
> Regards,
> Sanyam Goel
> 
> On Sun, Apr 7, 2024 at 11:17 PM Furkan KAMACI 
> wrote:
> 
> > Hi,
> >
> > ACK!
> >
> > Kind regards,
> > Furkan Kamaci
> >
> > On Sun, Apr 7, 2024 at 8:45 PM lewis john mcgibbney 
> > wrote:
> >
> > > Hi Nutch PMC,
> > > Please acknowledge and approve my request to mentor this years GSoC
> > > program.
> > > An ACK is sufficient.
> > > Thank you
> > > lewismc
> > >
> >
> 


[VOTE] Apache Nutch 1.20 Release

2024-04-09 Thread lewis john mcgibbney
Hi Folks,

A first candidate for the Nutch 1.20 release is available at [0] where
accompanying SHA512 and ASC signatures can also be found.
Information on verifying releases can be found at [1].

The release candidate comprises a .zip and tar.gz archive of the sources at
[2] and complementary binary distributions. In addition, a staged maven
repository is available at [3].

The Nutch 1.20 release report is available at [4].

Please vote on releasing this package as Apache Nutch 1.20. The vote is
open for at least the next 72 hours and passes if a majority of at least
three +1 Nutch PMC votes are cast.

[ ] +1 Release this package as Apache Nutch X.XX.

[ ] -1 Do not release this package because…

Cheers,
lewismc
P.S. Here is my +1.

[0] https://dist.apache.org/repos/dist/dev/nutch/1.20
[1] http://nutch.apache.org/downloads.html#verify
[2] https://github.com/apache/nutch/tree/release-1.20
[3] https://repository.apache.org/content/repositories/orgapachenutch-1021/
[4] https://s.apache.org/ovjf3

--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Apache Nutch 1.20 Release

2024-04-09 Thread lewis john mcgibbney
Hi Folks,

A first candidate for the Nutch 1.20 release is available at [0] where
accompanying SHA512 and ASC signatures can also be found.
Information on verifying releases can be found at [1].

The release candidate comprises a .zip and tar.gz archive of the sources at
[2] and complementary binary distributions. In addition, a staged maven
repository is available at [3].

The Nutch 1.20 release report is available at [4].

Please vote on releasing this package as Apache Nutch 1.20. The vote is
open for at least the next 72 hours and passes if a majority of at least
three +1 Nutch PMC votes are cast.

[ ] +1 Release this package as Apache Nutch X.XX.

[ ] -1 Do not release this package because…

Cheers,
lewismc
P.S. Here is my +1.

[0] https://dist.apache.org/repos/dist/dev/nutch/1.20
[1] http://nutch.apache.org/downloads.html#verify
[2] https://github.com/apache/nutch/tree/release-1.20
[3] https://repository.apache.org/content/repositories/orgapachenutch-1021/
[4] https://s.apache.org/ovjf3

--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [DISCUSS] Interest in rolling a release

2024-04-09 Thread lewis john mcgibbney
Thanks Kevin. I have a few items in the backlog and will tackle those tests
when I come back to it.
lewismc

On Tue, Apr 9, 2024 at 05:43 Kevin Ratnasekera 
wrote:

> Thank you for taking the initiative Lewis. +1 I think we can go ahead with
> 1.0. Last time when I checked the main branch had one test failure and
> maybe we should fix that before the release.
>
> On Sat, Apr 6, 2024 at 2:39 AM lewis john mcgibbney 
> wrote:
>
>> Hi dev@,
>> What is the current status of Gora with regards to rolling a 0.10? Or
>> Maybe
>> a 1.0?
>> Thanks
>> lewismc
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>


[jira] [Resolved] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3038.
-
Resolution: Fixed

> Address issues discovered during 1.20 release management dryrun
> ---
>
> Key: NUTCH-3038
> URL: https://issues.apache.org/jira/browse/NUTCH-3038
> Project: Nutch
>  Issue Type: Task
>  Components: build, docker
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.20
>
>
> During the 1.20 release management dryrun I discovered the following issues 
> which I think should be addressed in order to be satisfied with the release 
> candidate
>  # Update docker/README to remove broken badge
>  # Upgrade alpine base image in docker/Dockerfile
>  # Migrate CHANGES.txt to CHANGES.md
>  # Upgrade apache parent pom version from 23 to 31
>  # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
>  # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
> ivy/mvn.template
>  # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3038.
---

Thanks [~snagel] 

> Address issues discovered during 1.20 release management dryrun
> ---
>
> Key: NUTCH-3038
> URL: https://issues.apache.org/jira/browse/NUTCH-3038
> Project: Nutch
>  Issue Type: Task
>  Components: build, docker
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.20
>
>
> During the 1.20 release management dryrun I discovered the following issues 
> which I think should be addressed in order to be satisfied with the release 
> candidate
>  # Update docker/README to remove broken badge
>  # Upgrade alpine base image in docker/Dockerfile
>  # Migrate CHANGES.txt to CHANGES.md
>  # Upgrade apache parent pom version from 23 to 31
>  # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
>  # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
> ivy/mvn.template
>  # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3038 stopped by Lewis John McGibbney.
---
> Address issues discovered during 1.20 release management dryrun
> ---
>
> Key: NUTCH-3038
> URL: https://issues.apache.org/jira/browse/NUTCH-3038
> Project: Nutch
>  Issue Type: Task
>  Components: build, docker
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.20
>
>
> During the 1.20 release management dryrun I discovered the following issues 
> which I think should be addressed in order to be satisfied with the release 
> candidate
>  # Update docker/README to remove broken badge
>  # Upgrade alpine base image in docker/Dockerfile
>  # Migrate CHANGES.txt to CHANGES.md
>  # Upgrade apache parent pom version from 23 to 31
>  # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
>  # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
> ivy/mvn.template
>  # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-04-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835077#comment-17835077
 ] 

Lewis John McGibbney commented on TIKA-4232:


It turns out that the original GitHub action I wanted to use will  not be 
approved to use. 

I’m therefore investigating running the tests via the 
[https://github.com/marketplace/actions/docker-run-action] to run the 
{{{}helmunittest/helm-unittest Docker image{}}},  and generate the junit report 
and then using the [https://github.com/marketplace/actions/junit-report-action] 
to report the tests to the PR. 

 

I’ll do further investigation and followup here. 

> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Mentor request for lewismc

2024-04-07 Thread lewis john mcgibbney
Hi Nutch PMC,
Please acknowledge and approve my request to mentor this years GSoC program.
An ACK is sufficient.
Thank you
lewismc


[jira] [Work started] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-05 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3038 started by Lewis John McGibbney.
---
> Address issues discovered during 1.20 release management dryrun
> ---
>
> Key: NUTCH-3038
> URL: https://issues.apache.org/jira/browse/NUTCH-3038
> Project: Nutch
>  Issue Type: Task
>  Components: build, docker
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.20
>
>
> During the 1.20 release management dryrun I discovered the following issues 
> which I think should be addressed in order to be satisfied with the release 
> candidate
>  # Update docker/README to remove broken badge
>  # Upgrade alpine base image in docker/Dockerfile
>  # Migrate CHANGES.txt to CHANGES.md
>  # Upgrade apache parent pom version from 23 to 31
>  # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
>  # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
> ivy/mvn.template
>  # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-05 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3038:

Description: 
During the 1.20 release management dryrun I discovered the following issues 
which I think should be addressed in order to be satisfied with the release 
candidate
 # Update docker/README to remove broken badge
 # Upgrade alpine base image in docker/Dockerfile
 # Migrate CHANGES.txt to CHANGES.md
 # Upgrade apache parent pom version from 23 to 31
 # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
 # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
ivy/mvn.template
 # Remove miredot plugin usage from ivy/mvn.template

  was:
During the 1.20 release management dryrun I discovered the following issues 
which I think should be addressed in order to be satisfied with the release 
candidate
 # Update docker/README to remove broken badge
 # Upgrade alpine base image in docker/Dockerfile
 # Migrate CHANGES.txt to CHANGES.md
 # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
 # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
ivy/mvn.template
 # Remove miredot plugin usage from ivy/mvn.template


> Address issues discovered during 1.20 release management dryrun
> ---
>
> Key: NUTCH-3038
> URL: https://issues.apache.org/jira/browse/NUTCH-3038
> Project: Nutch
>  Issue Type: Task
>  Components: build, docker
>Affects Versions: 1.20
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.20
>
>
> During the 1.20 release management dryrun I discovered the following issues 
> which I think should be addressed in order to be satisfied with the release 
> candidate
>  # Update docker/README to remove broken badge
>  # Upgrade alpine base image in docker/Dockerfile
>  # Migrate CHANGES.txt to CHANGES.md
>  # Upgrade apache parent pom version from 23 to 31
>  # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
>  # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
> ivy/mvn.template
>  # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-05 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3038:
---

 Summary: Address issues discovered during 1.20 release management 
dryrun
 Key: NUTCH-3038
 URL: https://issues.apache.org/jira/browse/NUTCH-3038
 Project: Nutch
  Issue Type: Task
  Components: build, docker
Affects Versions: 1.20
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.20


During the 1.20 release management dryrun I discovered the following issues 
which I think should be addressed in order to be satisfied with the release 
candidate
 # Update docker/README to remove broken badge
 # Upgrade alpine base image in docker/Dockerfile
 # Migrate CHANGES.txt to CHANGES.md
 # Upgrade maven-gpg-plugin dependency from 1.6 to 3.2.2 in build.xml
 # Upgrade maven-compiler-plugin version from 3.8.1 to 3.13.0 in 
ivy/mvn.template
 # Remove miredot plugin usage from ivy/mvn.template



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Interest in rolling a release

2024-04-05 Thread lewis john mcgibbney
Hi dev@,
What is the current status of Gora with regards to rolling a 0.10? Or Maybe
a 1.0?
Thanks
lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Closed] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-04-04 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3032.
---

Thanks [~jglvary] and congratulations on your first contribution to Apache 
Nutch :)

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Assignee: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Fix For: 1.20
>
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Apache Tika 2.9.2 released

2024-04-02 Thread lewis john mcgibbney
All good.
I’m looking into a way to just automate the Helm Chart release based on a
Webhook payload every time a new Docker container image is pushed to
DockerHub.
That would simplify things some…


On Tue, Apr 2, 2024 at 12:24 Tim Allison  wrote:

> Oops:
> https://cwiki.apache.org/confluence/display/TIKA/Release+Process+for+tika-helm
>
> Help...
>
> On Tue, Apr 2, 2024 at 3:22 PM Tim Allison  wrote:
> >
> > I did a global and thoughtless find/replace. Please review and merge
> > if this makes sense: https://github.com/apache/tika-helm/pull/19
> >
> > cc @lewis john mcgibbney
> >
> > On Tue, Apr 2, 2024 at 3:09 PM Tim Allison  wrote:
> > >
> > > I also released our docker images for 2.9.2.0.
> > >
> > > How do we update helm?
> > >
> > > On Tue, Apr 2, 2024 at 2:31 PM Tim Allison 
> wrote:
> > > >
> > > > The Apache Tika project is pleased to announce the release of Apache
> > > > Tika 2.9.2. The release contents have been pushed out to the main
> > > > Apache release site and to the Maven Central sync.
> > > >
> > > > Apache Tika is a toolkit for detecting and extracting metadata and
> > > > structured text content from various documents using existing parser
> > > > libraries.
> > > >
> > > > Apache Tika 2.9.2 includes numerous bug fixes and dependency
> upgrades.
> > > > Details can be found in the changes file:
> > > > https://www.apache.org/dist/tika/2.9.2/CHANGES-2.9.2.txt
> > > >
> > > > Apache Tika is available on the download page:
> > > > https://tika.apache.org/download.html
> > > >
> > > > Apache Tika is also available in binary form or for use using Maven 2
> > > > from the Central Repository:
> > > > https://repo1.maven.org/maven2/org/apache/tika/
> > > >
> > > > When downloading, please remember to verify the downloads using
> > > > signatures found: https://www.apache.org/dist/tika/KEYS
> > > >
> > > > For more information on Apache Tika, visit the project home page:
> > > > https://tika.apache.org/
> > > >
> > > > -- Tim Allison, on behalf of the Apache Tika community
>


[jira] [Created] (TIKA-4233) Check tika-helm for deprecated k8s APIs

2024-03-30 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-4233:
--

 Summary: Check tika-helm for deprecated k8s APIs
 Key: TIKA-4233
 URL: https://issues.apache.org/jira/browse/TIKA-4233
 Project: Tika
  Issue Type: New Feature
  Components: tika-helm
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.9.2


It is useful to know when a Helm Chart uses deprecated k8s APIs. A check for 
this would be ideal. The “Check deprecated k8s APIs” GitHub action accomplishes 
this.

[https://github.com/marketplace/actions/check-deprecated-k8s-apis]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4232) Create and execute unit tests for tika-helm

2024-03-30 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-4232:
--

 Summary: Create and execute unit tests for tika-helm
 Key: TIKA-4232
 URL: https://issues.apache.org/jira/browse/TIKA-4232
 Project: Tika
  Issue Type: Improvement
  Components: tika-helm
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.9.2


The goal is to execute chart unit tests against each tika-helm pull request.

I found the [Helm Unit 
Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.

The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4227) Register tika-helm Chart in artifacthub.io

2024-03-30 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832505#comment-17832505
 ] 

Lewis John McGibbney commented on TIKA-4227:


Available at [https://artifacthub.io/packages/helm/apache-tika/tika]

> Register tika-helm Chart in artifacthub.io
> --
>
> Key: TIKA-4227
> URL: https://issues.apache.org/jira/browse/TIKA-4227
> Project: Tika
>  Issue Type: Task
>  Components: tika-helm
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.9.2
>
>
> [https://artifacthub.io/] represents the most popular search interface for 
> (amongst lots of other artifacts) Helm Charts.
> This task will register the tika-helm Chart with [https://artifacthub.io/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4227) Register tika-helm Chart in artifacthub.io

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved TIKA-4227.

Resolution: Fixed

> Register tika-helm Chart in artifacthub.io
> --
>
> Key: TIKA-4227
> URL: https://issues.apache.org/jira/browse/TIKA-4227
> Project: Tika
>  Issue Type: Task
>  Components: tika-helm
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.9.2
>
>
> [https://artifacthub.io/] represents the most popular search interface for 
> (amongst lots of other artifacts) Helm Charts.
> This task will register the tika-helm Chart with [https://artifacthub.io/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (TIKA-4227) Register tika-helm Chart in artifacthub.io

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed TIKA-4227.
--

> Register tika-helm Chart in artifacthub.io
> --
>
> Key: TIKA-4227
> URL: https://issues.apache.org/jira/browse/TIKA-4227
> Project: Tika
>  Issue Type: Task
>  Components: tika-helm
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.9.2
>
>
> [https://artifacthub.io/] represents the most popular search interface for 
> (amongst lots of other artifacts) Helm Charts.
> This task will register the tika-helm Chart with [https://artifacthub.io/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


tika-helm now on artifacthub.io

2024-03-30 Thread lewis john mcgibbney
Hi user@, dev@,

For those running Tika on Kubernetes, you can now conveniently find the
Helm Chart via artifacthub.io

https://artifacthub.io/packages/helm/apache-tika/tika

I’ll build in a little more automation so that this thing just takes care
of itself.

Thanks to all contributors.

lewismc


tika-helm now on artifacthub.io

2024-03-30 Thread lewis john mcgibbney
Hi user@, dev@,

For those running Tika on Kubernetes, you can now conveniently find the
Helm Chart via artifacthub.io

https://artifacthub.io/packages/helm/apache-tika/tika

I’ll build in a little more automation so that this thing just takes care
of itself.

Thanks to all contributors.

lewismc


[jira] [Updated] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3032:

Fix Version/s: 1.20

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Assignee: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Fix For: 1.20
>
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned NUTCH-3032:
---

Assignee: Joe Gilvary

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Assignee: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2856 stopped by Lewis John McGibbney.
---
> Implement a protocol-smb plugin based on hierynomus/smbj
> 
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-2887) Migrate to JUnit 5 Jupiter

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2887 stopped by Lewis John McGibbney.
---
> Migrate to JUnit 5 Jupiter
> --
>
> Key: NUTCH-2887
> URL: https://issues.apache.org/jira/browse/NUTCH-2887
> Project: Nutch
>  Issue Type: Improvement
>  Components: test
> Environment: Migrate 
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> This effort is a bit of a beast. See the [JUnit migration 
> tips|https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4-tips]
>  for general guidance. A general grep for junit in src produces the following
> {code:bash}
> ./test/nutch-site.xml
> ./test/org/apache/nutch/tools/TestCommonCrawlDataDumper.java
> ./test/org/apache/nutch/net/TestURLNormalizers.java
> ./test/org/apache/nutch/net/protocols/TestHttpDateFormat.java
> ./test/org/apache/nutch/net/TestURLFilters.java
> ./test/org/apache/nutch/util/TestStringUtil.java
> ./test/org/apache/nutch/util/TestSuffixStringMatcher.java
> ./test/org/apache/nutch/util/TestEncodingDetector.java
> ./test/org/apache/nutch/util/TestMimeUtil.java
> ./test/org/apache/nutch/util/TestPrefixStringMatcher.java
> ./test/org/apache/nutch/util/DumpFileUtilTest.java
> ./test/org/apache/nutch/util/TestNodeWalker.java
> ./test/org/apache/nutch/util/WritableTestUtils.java
> ./test/org/apache/nutch/util/TestTableUtil.java
> ./test/org/apache/nutch/util/TestURLUtil.java
> ./test/org/apache/nutch/util/TestGZIPUtils.java
> ./test/org/apache/nutch/parse/TestParseText.java
> ./test/org/apache/nutch/parse/TestOutlinks.java
> ./test/org/apache/nutch/parse/TestParseData.java
> ./test/org/apache/nutch/parse/TestOutlinkExtractor.java
> ./test/org/apache/nutch/parse/TestParserFactory.java
> ./test/org/apache/nutch/segment/TestSegmentMerger.java
> ./test/org/apache/nutch/segment/TestSegmentMergerCrawlDatums.java
> ./test/org/apache/nutch/plugin/TestPluginSystem.java
> ./test/org/apache/nutch/fetcher/TestFetcher.java
> ./test/org/apache/nutch/protocol/TestProtocolFactory.java
> ./test/org/apache/nutch/protocol/TestContent.java
> ./test/org/apache/nutch/protocol/AbstractHttpProtocolPluginTest.java
> ./test/org/apache/nutch/crawl/TestCrawlDbFilter.java
> ./test/org/apache/nutch/crawl/TestTextProfileSignature.java
> ./test/org/apache/nutch/crawl/TestCrawlDbStates.java
> ./test/org/apache/nutch/crawl/TestGenerator.java
> ./test/org/apache/nutch/crawl/TestAdaptiveFetchSchedule.java
> ./test/org/apache/nutch/crawl/TODOTestCrawlDbStates.java
> ./test/org/apache/nutch/crawl/TestSignatureFactory.java
> ./test/org/apache/nutch/crawl/ContinuousCrawlTestUtil.java
> ./test/org/apache/nutch/crawl/TestInjector.java
> ./test/org/apache/nutch/crawl/TestLinkDbMerger.java
> ./test/org/apache/nutch/crawl/TestCrawlDbMerger.java
> ./test/org/apache/nutch/service/TestNutchServer.java
> ./test/org/apache/nutch/metadata/TestMetadata.java
> ./test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java
> ./test/org/apache/nutch/indexer/TestIndexingFilters.java
> ./test/org/apache/nutch/indexer/TestIndexerMapReduce.java
> ./bin/nutch
> ./plugin/scoring-orphan/src/test/org/apache/nutch/scoring/orphan/TestOrphanScoringFilter.java
> ./plugin/index-basic/src/test/org/apache/nutch/indexer/basic/TestBasicIndexingFilter.java
> ./plugin/urlfilter-domaindenylist/build.xml
> ./plugin/urlfilter-domaindenylist/src/test/org/apache/nutch/urlfilter/domaindenylist/TestDomainDenylistURLFilter.java
> ./plugin/protocol-imaps/plugin.xml
> ./plugin/protocol-imaps/ivy.xml
> ./plugin/protocol-imaps/lib/junit-4.13.jar
> ./plugin/protocol-imaps/lib/greenmail-junit4-1.6.0.jar
> ./plugin/protocol-imaps/lib/greenmail-1.6.0.jar
> ./plugin/protocol-imaps/src/test/org/apache/nutch/protocol/imaps/TestImaps.java
> ./plugin/protocol-file/build.xml
> ./plugin/protocol-file/src/test/org/apache/nutch/protocol/file/TestProtocolFile.java
> ./plugin/urlnormalizer-regex/build.xml
> ./plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java
> ./plugin/build-plugin.xml
> ./plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java
> ./plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
> ./plugin/urlnormalizer-protocol/build.xml
> ./plugin/urlnormalizer-protocol/src/test/org/apache/nutch/net/urlnormalizer/protocol/TestProtocolURLNormalizer.java
> ./plugin/urlfilter-prefix/src/test/org/apache/nutch/urlfilter/prefi

[jira] [Closed] (NUTCH-2832) Create tutorial on sending Nutch logs to Elasticsearch

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-2832.
---

> Create tutorial on sending Nutch logs to Elasticsearch
> --
>
> Key: NUTCH-2832
> URL: https://issues.apache.org/jira/browse/NUTCH-2832
> Project: Nutch
>  Issue Type: New Feature
>  Components: configuration, deployment
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> A while back I used to use [Chukwa|https://chukwa.apache.org/] for log 
> aggregation and analysis. Chukwa is now retired. 
> I a bit of research into directly logging Log4j2 into Elasticsearch and came 
> across 
> [log4j2-elasticsearch|https://github.com/rfoltyns/log4j2-elasticsearch] which 
> looks pretty simple.
> I'm going to have a crack at implementing this functionality as a 
> configuration option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-2832) Create tutorial on sending Nutch logs to Elasticsearch

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2832.
-
Resolution: Won't Fix

Given the license changes regarding the concerned backend I have no interest 
implementing this anymore. 

> Create tutorial on sending Nutch logs to Elasticsearch
> --
>
> Key: NUTCH-2832
> URL: https://issues.apache.org/jira/browse/NUTCH-2832
> Project: Nutch
>  Issue Type: New Feature
>  Components: configuration, deployment
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> A while back I used to use [Chukwa|https://chukwa.apache.org/] for log 
> aggregation and analysis. Chukwa is now retired. 
> I a bit of research into directly logging Log4j2 into Elasticsearch and came 
> across 
> [log4j2-elasticsearch|https://github.com/rfoltyns/log4j2-elasticsearch] which 
> looks pretty simple.
> I'm going to have a crack at implementing this functionality as a 
> configuration option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3036.
-
Resolution: Fixed

> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3036.
---

> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3035.
---

> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3035.
-
Resolution: Fixed

> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3037.
-
Resolution: Fixed

> Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
> --
>
> Key: NUTCH-3037
> URL: https://issues.apache.org/jira/browse/NUTCH-3037
> Project: Nutch
>  Issue Type: Task
>  Components: indexer-kafka
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, 
> I therefore propose to upgrade.
> I will also state that a _*kafka_2.13*_ artifact exists. This would demand 
> that the underlying Scala version be also upgraded... but I think this should 
> be addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3037.
---

> Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
> --
>
> Key: NUTCH-3037
> URL: https://issues.apache.org/jira/browse/NUTCH-3037
> Project: Nutch
>  Issue Type: Task
>  Components: indexer-kafka
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, 
> I therefore propose to upgrade.
> I will also state that a _*kafka_2.13*_ artifact exists. This would demand 
> that the underlying Scala version be also upgraded... but I think this should 
> be addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4227) Register tika-helm Chart in artifacthub.io

2024-03-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-4227:
--

 Summary: Register tika-helm Chart in artifacthub.io
 Key: TIKA-4227
 URL: https://issues.apache.org/jira/browse/TIKA-4227
 Project: Tika
  Issue Type: Task
  Components: tika-helm
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.9.2


[https://artifacthub.io/] represents the most popular search interface for 
(amongst lots of other artifacts) Helm Charts.

This task will register the tika-helm Chart with [https://artifacthub.io/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Tika chart cannot be reached

2024-03-26 Thread Lewis John McGibbney
Hi Pietro,

On 2024/03/26 08:13:39 Pietro Susca wrote:

> 
> Francesco request's is that repo url in not working
> also tika is not searchable on the helm repo hub

Do you mean here - https://artifacthub.io/ ?
If you want it to be searchable via that platform then i can try to make an 
entry.

If there are any other problems with the Chart then please let me know.

Ciao
lewismc


Re: Tika chart cannot be reached

2024-03-26 Thread Lewis John McGibbney
Hi Francesco,
Thanks for letting us know that the repository was unreachable… I can only 
conclude that this was intermittent.
I can easily fetch and deploy the Chart as follows

helm repo add tika https://apache.jfrog.io/artifactory/tika
helm install tika tika/tika --set image.tag=latest-full -n tika-test

Thanks
lewismc

On 2024/03/25 12:16:31 Francesco Scuccimarri wrote:
> Hi Team Dev Tika,
> Over the past few days, I've encountered an issue while trying to use
> tika-helm . When I attempt to add the
> repository for Tika charts using the Helm command, I receive the following
> error message:
> 
> *Looks like 'https://apache.jfrog.io/artifactory/tika/
> ' is not a valid chart
> repository or cannot be reached.*
> 
> It seems that the issue is specific to the Tika chart repository.
> Do you have any updates regarding any changes to the Tika chart repository
> or its accessibility? I've reviewed the documentation and searched online,
> but I haven't found any recent information about this issue.
> 
> Thank you very much for your support.
> 
> Best regards,
> Francesco Scuccimarri
> 


[jira] [Work stopped] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-21 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3037 stopped by Lewis John McGibbney.
---
> Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
> --
>
> Key: NUTCH-3037
> URL: https://issues.apache.org/jira/browse/NUTCH-3037
> Project: Nutch
>  Issue Type: Task
>  Components: indexer-kafka
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, 
> I therefore propose to upgrade.
> I will also state that a _*kafka_2.13*_ artifact exists. This would demand 
> that the underlying Scala version be also upgraded... but I think this should 
> be addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-21 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3037:

Flags: Patch

> Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
> --
>
> Key: NUTCH-3037
> URL: https://issues.apache.org/jira/browse/NUTCH-3037
> Project: Nutch
>  Issue Type: Task
>  Components: indexer-kafka
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, 
> I therefore propose to upgrade.
> I will also state that a _*kafka_2.13*_ artifact exists. This would demand 
> that the underlying Scala version be also upgraded... but I think this should 
> be addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-21 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3037 started by Lewis John McGibbney.
---
> Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
> --
>
> Key: NUTCH-3037
> URL: https://issues.apache.org/jira/browse/NUTCH-3037
> Project: Nutch
>  Issue Type: Task
>  Components: indexer-kafka
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, 
> I therefore propose to upgrade.
> I will also state that a _*kafka_2.13*_ artifact exists. This would demand 
> that the underlying Scala version be also upgraded... but I think this should 
> be addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-21 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3037:
---

 Summary: Upgrade org.apache.kafka:kafka_2.12: to v3.7.0
 Key: NUTCH-3037
 URL: https://issues.apache.org/jira/browse/NUTCH-3037
 Project: Nutch
  Issue Type: Task
  Components: indexer-kafka
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.20


We depend on v1.1.0 which is quite a bit behind the current v3.7.0 artifact, I 
therefore propose to upgrade.

I will also state that a _*kafka_2.13*_ artifact exists. This would demand that 
the underlying Scala version be also upgraded... but I think this should be 
addressed in a separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3036 stopped by Lewis John McGibbney.
---
> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3036 started by Lewis John McGibbney.
---
> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3036:
---

 Summary: Upgrade org.seleniumhq.selenium:selenium-java dependency 
in lib-selenium
 Key: NUTCH-3036
 URL: https://issues.apache.org/jira/browse/NUTCH-3036
 Project: Nutch
  Issue Type: Improvement
  Components: selenium, plugin
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.20


lib-selenium currently packages org.seleniumhq.selenium:selenium-java *v4.7.2* 
but *v4.18.1* is available on Maven Central.

This ticket will upgrade the java dependency and validate that both 
protocol-selenium and protocol-interactiveselenium work as expected in local 
mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IVY-1651) Augment 'Child elements’ section of 'File System Resolver' documentation

2024-03-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/IVY-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826781#comment-17826781
 ] 

Lewis John McGibbney commented on IVY-1651:
---

PR available at [https://github.com/apache/ant-ivy/pull/101]

> Augment 'Child elements’ section of 'File System Resolver' documentation
> 
>
> Key: IVY-1651
> URL: https://issues.apache.org/jira/browse/IVY-1651
> Project: Ivy
>  Issue Type: Improvement
>  Components: Documentation, Maven Compatibility
>    Reporter: Lewis John McGibbney
>Priority: Trivial
> Fix For: 2.5.3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I [recently encountered some 
> confusion|[https://lists.apache.org/thread/tzvtw4j2d9pcxhqjxyb2dwnsk50t47b5]] 
> when upgrading from Ivy 2.5.0 —> 2.5.2.
> I think the documentation at 
> [https://ant.apache.org/ivy/history/2.5.2/resolver/filesystem.html#_child_elements]
>  could be augmented to atleast link back to the [Maven 
> documentation|[https://maven.apache.org/pom.html#dependencies]] which 
> explicitly references acceptable constituent values for the resolver pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IVY-1651) Augment 'Child elements’ section of 'File System Resolver' documentation

2024-03-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created IVY-1651:
-

 Summary: Augment 'Child elements’ section of 'File System 
Resolver' documentation
 Key: IVY-1651
 URL: https://issues.apache.org/jira/browse/IVY-1651
 Project: Ivy
  Issue Type: Improvement
  Components: Documentation, Maven Compatibility
Reporter: Lewis John McGibbney
 Fix For: 2.5.3


I [recently encountered some 
confusion|[https://lists.apache.org/thread/tzvtw4j2d9pcxhqjxyb2dwnsk50t47b5]] 
when upgrading from Ivy 2.5.0 —> 2.5.2.

I think the documentation at 
[https://ant.apache.org/ivy/history/2.5.2/resolver/filesystem.html#_child_elements]
 could be augmented to atleast link back to the [Maven 
documentation|[https://maven.apache.org/pom.html#dependencies]] which 
explicitly references acceptable constituent values for the resolver pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826776#comment-17826776
 ] 

Lewis John McGibbney commented on NUTCH-3029:
-

Hi [~martin.dj] [~markus17] it looks like we are missing some Javadoc

 
{quote} [javadoc] Standard Doclet version 11.0.22 {quote}
{quote} [javadoc] Building tree for all the packages and classes... 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @param for url 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @return 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @throws for java.net.URISyntaxException 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:205:
 warning: no @return 
 [javadoc] public float getMaxInterval(Text url, float defaultMaxInterval){ 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:227:
 warning: no @return 
 [javadoc] public float getMinInterval(Text url, float defaultMinInterval){ 
{quote}
{quote} [javadoc] ^{quote}
 

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Self Introduction - Xuanwo

2024-03-13 Thread Lewis John McGibbney
Nice welcome Xuanwo thanks for introeucing yourself.
lewismc

On 2024/03/10 05:20:20 Xuanwo wrote:
> Hello, everyone
> 
> I'm Xuanwo, and I'm following the "Contribute" guide in 
> comdev-working-groups[1] to introduce myself and kickstart my contributions :)
> 
> My personal vision is "Empowering freely data access from ANY storage service 
> in ANY method". Open source is definitely an important part of achieving my 
> vision.
> 
> - I'm the PMC Chair for Apache OpenDAL [2], a project that graduated in 
> January 2024, aimed at enabling free data access.
> - I work at Databend Labs [3], focusing on cost-effective data analysis.
> - I'm also contributing to Apache Iceberg [4] to simplify reading SQL tables.
> 
> My current interest lies in open source sustainability. I want to learn how 
> to ensure a project's sustainability and foster community growth. I'm here to 
> explore how I can contribute to expanding the ASF community.
> 
> Pleased to meet you here; I'm looking forward to working together with you.
> 
> [1]: https://github.com/apache/comdev-working-groups
> [2]: https://github.com/apache/opendal
> [3]: https://github.com/datafuselabs/databend/
> [4]: https://github.com/apache/iceberg-rust
> 
> Xuanwo
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
> For additional commands, e-mail: dev-h...@community.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



Re: [QUESTION] What should community do in GSoC timeline?

2024-03-13 Thread Lewis John McGibbney
Hi Xuanwo,
It’s been a few years since I participated in GSoC as a mentor… but this year I 
intend to. Let me see if I can provide answers to some of your questions.

On 2024/03/11 03:07:29 Xuanwo wrote:
> 
> 2024-02-22: Potential GSoC contributors discuss application ideas with 
> mentoring organizations
> 
> Q: Should those ideas/proposals been posted to mailing list? Or just discuss 
> with mentors?

Mailing list is great however I don’t think there are any hard rules. This 
period is really just for attracting interest in the initiative (if it was 
created by the PMC/Committership) or convincing a PMC to take on your 
initiative (if it was created by a potential GSoC student).

> Q: Should student-submitted ideas/proposals be added to Jira?

Yes absolutely. Make sure the JIRA issue is labeled with “gsoc2024” as well. 
That way it will show up in the filter at 
https://issues.apache.org/jira/issues/?jql=labels+%3D+gsoc2024.

> 
> 2024-04-15: Proposals to ASF projects must be reviewed roughly and have a 
> potential mentor so that we know how many slots to request.
> 
> Q: Who will review/rank/score those proposals? The corresponding community's 
> PMC?
> 


In short yes but really it is down to the mentor(s). It is always good to have 
a backup mentor as well in-case the mentor is unable to see the project through.

HTH
lewismc

-
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org



[jira] [Closed] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3033.
---

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3033.
-
Resolution: Fixed

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release Nutch 1.20

2024-03-12 Thread Lewis John McGibbney
I submitted a patch for the Ivy 2.5.2 upgrade. If folks could have a look at 
that it would be ideal.
https://github.com/apache/nutch/pull/803
I am free to roll a release candidate towards the end of this week.
lewismc

On 2024/03/10 15:08:36 Lewis John McGibbney wrote:
> Nice  
> I wee that we  are a couple releases behind of Ivy as well as I’ll submit a 
> patch for that.
> I can push this release this time. It’s been a while since I exercised the 
> workflow and it would be good to blow away the cobb webs.
> lewismc
> 
> On 2024/03/10 11:55:20 Markus Jelsma wrote:
> > Good idea! I'll finish work on three open issues the next week.
> > 
> > Op za 9 mrt 2024 om 13:02 schreef Sebastian Nagel <
> > wastl.na...@googlemail.com>:
> > 
> > > Hi Lewis,
> > >
> > > yes, of course!
> > >
> > > Some points we should do before the release:
> > >
> > > - address the ES licensing issue,
> > >the easiest way is to downgrade, see NUTCH-3008
> > >If done update the license-related files.
> > >
> > > - there are three short PRs open
> > >
> > > I'll try to have a look at these points the next days.
> > >
> > > Best,
> > > Sebastian
> > >
> > >
> > > On 3/8/24 01:43, lewis john mcgibbney wrote:
> > > > Hi dev@,
> > > > As of today, 51 issues have been addressed in the 1.20 development 
> > > > drive.
> > > > https://issues.apache.org/jira/projects/NUTCH/versions/12352190
> > > > <https://issues.apache.org/jira/projects/NUTCH/versions/12352190>
> > > > I would like to push a release soon and ship it to the user community.
> > > > Any objections?
> > > > Thank you
> > > > lewismc
> > > >
> > >
> > 
> 


[jira] [Updated] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-12 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-3033:

Due Date: 12/Mar/24  (was: 11/Mar/24)

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-12 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3033 stopped by Lewis John McGibbney.
---
> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Differences in retrieve pattern between Ivy 2.5.0/2.5.1 & 2.5.2?

2024-03-12 Thread Lewis John McGibbney
Thanks for this guidance Stefan :) 
I was able to get a patch together at https://github.com/apache/nutch/pull/803
Hopefully this helps others who may be confused as I was.
Thank you
lewsmc

On 2024/03/12 18:57:51 Stefan Bodewig wrote:
> On 2024-03-11, lewis john mcgibbney wrote:
> 
> > I am working on upgrading Ivy to latest over in the Apache Nutch project.
> > The build works just fine with 2.5.0 and 2.5.1 but with 2.5.2 the CI
> > fails with the following complaint
> 
> > /home/runner/work/nutch/nutch/src/plugin/build-plugin.xml:234:
> > impossible to ivy retrieve: java.lang.RuntimeException: problem during
> > retrieve of org.apache.nutch#lib-htmlunit: java.lang.RuntimeException:
> > Multiple artifacts of the module
> > io.netty#netty-transport-native-kqueue;4.1.84.Final are retrieved to
> > the same file! Update the retrieve pattern to fix this error.
> 
> Ivy 2.5.2 fixes a bug[1] when dealing with dependencies that have
> multiple Maven artifacts with different Maven classifiers. Prior to
> 2.5.2 Ivy would think they'd all be the same and just pick one.
> 
> io.netty#netty-transport-native-kqueue has several artifacts, at least
> this is what the repo looks like. I completely fail to understand the
> POM :-)
> 
> Your pattern probably needs a [classifier] to make sure two artifacts
> that differ by Maven classifier also target different file names.
> 
> Something like
> 
> pattern="${local-maven2-dir}/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]"
> 
> Stefan
> 
> [1] https://issues.apache.org/jira/browse/IVY-1642
> 


[GSoC 2024 PROPOSAL] Overhaul the legacy Nutch plugin framework and replace it with PF4J

2024-03-12 Thread lewis john mcgibbney
Hi user@ & dev@,

I decided to write up a GSoC’24 proposal and encourage interested
applicants to register your interest in the JIRA issue or else reach
out to the Nutch PMC over on dev@nutch.apache.org (please CC
lewi...@apache.org).

Title: Overhaul the legacy Nutch plugin framework and replace it with PF4J
JIRA: https://issues.apache.org/jira/browse/NUTCH-3034

Thanks in advance, and good luck to prospective GSoC applicants.

lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[GSoC 2024 PROPOSAL] Overhaul the legacy Nutch plugin framework and replace it with PF4J

2024-03-12 Thread lewis john mcgibbney
Hi user@ & dev@,

I decided to write up a GSoC’24 proposal and encourage interested
applicants to register your interest in the JIRA issue or else reach
out to the Nutch PMC over on d...@nutch.apache.org (please CC
lewi...@apache.org).

Title: Overhaul the legacy Nutch plugin framework and replace it with PF4J
JIRA: https://issues.apache.org/jira/browse/NUTCH-3034

Thanks in advance, and good luck to prospective GSoC applicants.

lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


  1   2   3   4   5   6   7   8   9   10   >