I did get to do some crawling today, a few thousand docs (HTML, PDF, some MS office formats). That run uses static, urlmeta, regexp, arbitrary, & metatag plugins. It indexes to Solr 9.8.1. It looks boffo!

+1 on RC2


Joe

On 16/07/2025 20:44, Joe Gilvary wrote:
Hi, all,

Thanks, Sebastian, for getting this out to testing. :) I only got a chance to download and run a couple "indexcheck" commands with my existing 1.20 configuration today. They went with no problems, using the nutch-site.xml and index-writers.xml that I've been using with 1.20.

Tomorrow I should have the chance to run some full fetch to index cycles. That'll be with Solr 9.8.1. Looking forward to it. :)

 Thanks, stay safe, stay healthy,

 Joe

On 16/07/2025 18:06, Sebastian Nagel wrote:
Hi Peter,

thanks for testing the release candidate!

> Just tested 1.21 binary release with latest Solr 9.8.1 (stable).
> Page was indexed with no issues.

Thanks!

> Would it be possible to upgrade solrj in indexer-solr plugin?
> Solr release 8.11 is marked as EoL on Solr Downloads - Apache Solr

Yes, for sure!

> In Solr9 the default binding IP was changed from '*' to localhost.

> But am not sure how it does affect indexer-solr plugin schema.xml.

Ok. These need to be checked...


Peter, feel free to open a Jira issue for upgrading to Solr 9.x on
  https://issues.apache.org/jira/projects/NUTCH/summary
You can create an account here:
  https://selfserve.apache.org/jira-account.html

Or let us know, otherwise...

We will add the upgrade in Nutch 1.22 - Solr 8.11.4 is still available,
so there's no reason to halt the release process. Ok?


> One warning seen of the Log4j configuration
>
>> 2025-07-16T19:50:14.986451428Z main WARN The use of package scanning to
>> locate Log4j plugins is deprecated.

I've seen this as well while testing and opened NUTCH-3119.
I did not consider this as a blocking issue.


Thanks and best,
Sebastian


On 7/16/25 22:13, Peter Viskup wrote:
Just tested 1.21 binary release with latest Solr 9.8.1 (stable).
Page was indexed with no issues.
Would it be possible to upgrade solrj in indexer-solr plugin?
Solr release 8.11 is marked as EoL on Solr Downloads - Apache Solr
<https://solr.apache.org/downloads.html>

Solr require solrj of version 8.10 and higher Major Changes in Solr 9 ::
Apache Solr Reference Guide
<https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html>
In Solr9 the default binding IP was changed from '*' to localhost.
Solr 9.7 came with Schema update to 1.7 version Major Changes in Solr 9 ::
Apache Solr Reference Guide
<https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#schemaversion-upgraded-to-1-7>
But am not sure how it does affect indexer-solr plugin schema.xml.

One warning seen of the Log4j configuration

2025-07-16T19:50:14.986451428Z main WARN The use of package scanning to
locate Log4j plugins is deprecated.
Please remove the `packages` attribute from your configuration file.
See https://logging.apache.org/log4j/2.x/faq.html#package-scanning for
details.


Peter

On Wed, Jul 16, 2025 at 4:34 PM Sebastian Nagel <[email protected]> wrote:

Hi Peter,

A note ahead: moving this discussion back to user@nutch to keep the Nutch
community in the loop.


  > can make it a try

This would be simply great! It's always helpful to have more people
test the release packages. Three votes from PMC members are required.
However, any serious issues discovered by anybody else will cause us
to withdraw a buggy release candidate. See also [1] for the Apache
voting process.

  > Can I help in some other way?

Of course! Any contribution is welcome, no matter whether it's about
code, documentation (our wiki [2]), testing and bug reports, pull request
reviews, etc.

  > Would 1.21 release support newer Solr 8+?

- NUTCH-3116 upgrades indexer-solr to Solr 8.11.4
- I've tested the release candidate to index into Solr 8.11.2
    which was already installed on my system
- I did not test indexing into Solr 9.x


Best,
Sebastian


[1] https://www.apache.org/foundation/voting.html
[2] https://cwiki.apache.org/confluence/display/NUTCH
[3] https://issues.apache.org/jira/browse/NUTCH-3116


On 7/16/25 16:00, Peter Viskup wrote:
Hi Sebastian,
can make it a try - am working on an PoC of Nutch for indexing our
public
administration sites in Slovakia.

Can I help in some other way?
Would 1.21 release support newer Solr 8+?

Peter

Dňa st 16. 7. 2025, 15:41 Sebastian Nagel <[email protected]
<mailto:[email protected]>> napísal(a):

     Hi everybody,


     A candidate for the Nutch 1.21 release is available at:

     https://dist.apache.org/repos/dist/dev/nutch/1.21/ <
https://dist.apache.org/
     repos/dist/dev/nutch/1.21/>

     The release candidate is a zip and tar.gz archive of the binary and
sources,
     built from:
     https://github.com/apache/nutch/tree/release-1.21 <
https://github.com/
apache/nutch/tree/release-1.21>

     In addition, a staged maven repository is available here:

https://repository.apache.org/content/repositories/orgapachenutch-1023
     <
https://repository.apache.org/content/repositories/orgapachenutch-1023>

     We addressed 48 issues:
     https://s.apache.org/bs58y <https://s.apache.org/bs58y>


     Please vote on releasing this package as Apache Nutch 1.21.
     The vote is open for the next 72 hours and passes if a majority
     of at least three +1 Nutch PMC votes are cast.

     [ ] +1 Release this package as Apache Nutch 1.21.
     [ ] -1 Do not release this package because...


     Cheers,
     Sebastian
     (On behalf of the Nutch PMC)



     P.S.

     Here is my +1.

     - Tested most of Nutch tools and run a test crawl on a single-node
cluster
         running Hadoop 3.3.6 and 3.4.1, see
https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
<https://
github.com/sebastian-nagel/nutch-test-single-node-cluster/>

     - Building from source package succeeds

     - Crawling in local mode using binary package succeeded

     - Built and run a Nutch Docker container (using the release branch)

     - While testing the release candidate two issues related to logging
were
         discovered:
         NUTCH-3118 is integrated into the 2nd release candidate eligible
for
                    this vote. The first release candidate was dropped
because
                    of this issue.
         NUTCH-3119 is scheduled for Nutch 1.22 (it's a minor issue)





Reply via email to