Re: regex-normalize.xml/regex-urlfilter.txt not found

2014-01-30 Thread Talat Uyarer
> > > > # skip URLs with slash-delimited segment that repeats 3+ times, to break > > loops > > -.*(/[^/]+)/[^/]+\1/[^/]+\1/ > > > > # accept anything else > > +. > > > > Kind Regards, > > > > Mauricio > > -Original Message- >

Re: regex-normalize.xml/regex-urlfilter.txt not found

2014-01-30 Thread Tejas Patil
/[^/]+)/[^/]+\1/[^/]+\1/ > > # accept anything else > +. > > Kind Regards, > > Mauricio > -Original Message- > From: Tejas Patil [mailto:tejas.patil...@gmail.com] > Sent: Thursday, January 30, 2014 4:51 PM > To: user@nutch.apache.org > Subject: Re: regex-norma

RE: regex-normalize.xml/regex-urlfilter.txt not found

2014-01-30 Thread Ciprian Rodriguez, Mauricio
loops -.*(/[^/]+)/[^/]+\1/[^/]+\1/ # accept anything else +. Kind Regards, Mauricio -Original Message- From: Tejas Patil [mailto:tejas.patil...@gmail.com] Sent: Thursday, January 30, 2014 4:51 PM To: user@nutch.apache.org Subject: Re: regex-normalize.xml/regex-urlfilter.txt not found Can you confi

Re: regex-normalize.xml/regex-urlfilter.txt not found

2014-01-30 Thread Tejas Patil
Can you confirm if 'regex-urlfilter.txt' is present inside 'conf' directory at the location where you are running the crawler ? If so, what are the contents of that file ? Thanks, Tejas On Thu, Jan 30, 2014 at 9:06 PM, Ciprian Rodriguez, Mauricio < mauricio.cipr...@atos.net> wrote: > Hi. > > I'

regex-normalize.xml/regex-urlfilter.txt not found

2014-01-30 Thread Ciprian Rodriguez, Mauricio
Hi. I'm developing a Java software that uses Nutch (2.2.1)+Hbase(0.94.16) integration. I'm getting a NullPointerException in org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRules(RegexURLFilterBase.java:179). I assume this error is related with following warnings in the log: Jan 30