> >
> > # skip URLs with slash-delimited segment that repeats 3+ times, to break
> > loops
> > -.*(/[^/]+)/[^/]+\1/[^/]+\1/
> >
> > # accept anything else
> > +.
> >
> > Kind Regards,
> >
> > Mauricio
> > -Original Message-
>
/[^/]+)/[^/]+\1/[^/]+\1/
>
> # accept anything else
> +.
>
> Kind Regards,
>
> Mauricio
> -Original Message-
> From: Tejas Patil [mailto:tejas.patil...@gmail.com]
> Sent: Thursday, January 30, 2014 4:51 PM
> To: user@nutch.apache.org
> Subject: Re: regex-norma
loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/
# accept anything else
+.
Kind Regards,
Mauricio
-Original Message-
From: Tejas Patil [mailto:tejas.patil...@gmail.com]
Sent: Thursday, January 30, 2014 4:51 PM
To: user@nutch.apache.org
Subject: Re: regex-normalize.xml/regex-urlfilter.txt not found
Can you confi
Can you confirm if 'regex-urlfilter.txt' is present inside 'conf' directory
at the location where you are running the crawler ? If so, what are the
contents of that file ?
Thanks,
Tejas
On Thu, Jan 30, 2014 at 9:06 PM, Ciprian Rodriguez, Mauricio <
mauricio.cipr...@atos.net> wrote:
> Hi.
>
> I'
Hi.
I'm developing a Java software that uses Nutch (2.2.1)+Hbase(0.94.16)
integration. I'm getting a NullPointerException in
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRules(RegexURLFilterBase.java:179).
I assume this error is related with following warnings in the log:
Jan 30
5 matches
Mail list logo