Hello Markus. Before running the commands I dumped the crawldb and checked again that document status is 5 (db_redir_perm), then I ran both commands with the same result, but the 301 document/s still exists in Solr
1. sudo bin/nutch clean crawl/crawldb/ 2. sudo bin/nutch solrclean crawl/crawldb/ No exchange was configured. The documents will be routed to all index writers. SolrIndexer: deleting 1000/1000 documents SolrIndexer: deleting 1000/2000 documents SolrIndexer: deleting 1000/3000 documents SolrIndexer: deleting 1000/4000 documents SolrIndexer: deleting 270/4270 documents Did I miss anything here? Regards, Hany From: Markus Jelsma <[email protected]> Sent: Tuesday, March 9, 2021 11:19 AM To: [email protected] Subject: EXTERNAL: Re: Re: 301 perm redirect pages are still in Solr Hello Hany, Sure, check these commands: solrclean remove HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead clean remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins Regards, Markus Op di 9 mrt. 2021 om 08:49 schreef Hany NASR <[email protected]<mailto:[email protected]>.invalid>: > Hello Markus, > > I added the property in nutch-site.xml with no luck. > > The documents still exist in Solr; any advice? > > Regards, > Hany > > From: Markus Jelsma > <[email protected]<mailto:[email protected]>> > Sent: Monday, March 8, 2021 3:40 PM > To: [email protected]<mailto:[email protected]> > Subject: EXTERNAL: Re: 301 perm redirect pages are still in Solr > > Hello Hany, > > You need to tell the indexer to delete those record. This will help: > > <!-- delete gone and redirects --> > <property> > <name>indexer.delete</name> > <value>true</value> > </property> > > Regards, > Markus > > Op ma 8 mrt. 2021 om 15:31 schreef Hany NASR > <[email protected]<mailto:[email protected]><mailto: > [email protected]<mailto:[email protected]>>.invalid>: > > > Hi All, > > > > I'm using Nutch 1.15, and figure out that permeant redirect pages (301) > > are still indexed and not removed in Solr. > > > > When I exported the crawlDB I found the page Status: 5 (db_redir_perm). > > > > How can I keep Solr index up to date and make Nutch clean these pages > > automatically? > > > > Regards, > > Hany > > > > ----------------------------------------- > > SAVE PAPER - THINK BEFORE YOU PRINT! > > > > This E-mail is confidential. > > > > It may also be legally privileged. If you are not the addressee you may > > not copy, > > forward, disclose or use any part of it. If you have received this > message > > in error, > > please delete it and all copies from your system and notify the sender > > immediately by > > return E-mail. > > > > Internet communications cannot be guaranteed to be timely secure, error > or > > virus-free. > > The sender does not accept liability for any errors or omissions. > > > > ****************************************************************** > This message originated from the Internet. Its originator may or > may not be who they claim to be and the information contained in > the message and any attachments may or may not be accurate. > ****************************************************************** > > ----------------------------------------- > SAVE PAPER - THINK BEFORE YOU PRINT! > > This E-mail is confidential. > > It may also be legally privileged. If you are not the addressee you may > not copy, > forward, disclose or use any part of it. If you have received this message > in error, > please delete it and all copies from your system and notify the sender > immediately by > return E-mail. > > Internet communications cannot be guaranteed to be timely secure, error or > virus-free. > The sender does not accept liability for any errors or omissions. > ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions.

