Hi Jason,

On Thu, Jan 16, 2014 at 6:09 AM, <[email protected]> wrote:

>
> I had tried +^http://www.cancer.gov/cancertopics/druginfo<
> http://www.cancer.gov/cancertopics/druginfo/lungcancer>.*
> or +^http://www.cancer.gov/cancertopics/druginfo/<
> http://www.cancer.gov/cancertopics/druginfo/lungcancer>([a-z0-9]*\.)
>

I don't think that these regex's look correct.
Why don't you first try

+^http://([a-z0-9]*\.)*cancer.gov/
# block anything else
-.

Then you can begin to narrow down your search. You should maybe also look
at the domain filtering code and plugins in Nutch.
Ta
Lewis

Reply via email to