Hi Jason, On Thu, Jan 16, 2014 at 6:09 AM, <[email protected]> wrote:
> > I had tried +^http://www.cancer.gov/cancertopics/druginfo< > http://www.cancer.gov/cancertopics/druginfo/lungcancer>.* > or +^http://www.cancer.gov/cancertopics/druginfo/< > http://www.cancer.gov/cancertopics/druginfo/lungcancer>([a-z0-9]*\.) > I don't think that these regex's look correct. Why don't you first try +^http://([a-z0-9]*\.)*cancer.gov/ # block anything else -. Then you can begin to narrow down your search. You should maybe also look at the domain filtering code and plugins in Nutch. Ta Lewis

