If there is no restriction on the number at the end of the url, you might just use this: (note that the rule must be above the one which filters urls with a "?" character)
*+http://www.xyz.com/\?page=* * * *# skip URLs containing certain characters as probable queries, etc.* *-[?*!@=]* On Sun, May 12, 2013 at 12:40 AM, Renato Marroquín Mogrovejo < [email protected]> wrote: > Hi all, > > I have been trying to fetch a query similar to: > > http://www.xyz.com/?page=1 > > But where the number can vary from 1 to 100. Inside the first page > there are links to the next ones. So I updated the > conf/regex-urlfilter file and added: > > ^[0-9]{1,45}$ > > When I do this, the generate job fails saying that it is "Invalid > first character". I have tried generating with topN 5 and depth 5 and > trying to fetch more urls but that does not work. > > Could anyone advise me on how to accomplish this? I am running Nutch 2.x. > Thanks in advance! > > > Renato M. >

