-----Original message----- > From:dogrdon <dgor...@planning.org> > Sent: Monday 22nd July 2013 17:02 > To: user@nutch.apache.org > Subject: RE: Why aren't my path exclusions getting excluded in the Nutch > index to Solr? > > Thanks, I was wondering if some kind of reset like that needed to happen. > > I am fairly certain that putting the exclusion regexes before the inclusion > regex has been the answer to my problem. But I have been testing this by > just deleting the entire crawl directory to wipe out old crawls (still in > testing here so it's fine).
Oh yes, order matters. The first match (include or exclude) is taken. > > Though when this goes to production, how do i "refilter the database" with > Nutch? and does this assume that I am indexing to Solr? This doesn't assume any search engine back end, it only operates on Nutch' own data structures. Lucene understands regex queries for a while now so you need that to update the index. Lucene requires a regex match for the whole ID / URL so some regexes have to be rewritten. > > thanks again > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Why-aren-t-my-path-exclusions-getting-excluded-in-the-Nutch-index-to-Solr-tp4079172p4079483.html > Sent from the Nutch - User mailing list archive at Nabble.com. >