Hi, I don't know if it is a bug, however your suggested improvement would be welcomed without a doubt.
If you could please log a Jira we can review. Best Lewis On Fri, Jan 4, 2013 at 3:39 AM, Tejas Patil <[email protected]>wrote: > Hi, > > As per [0], a FTP website can have robots.txt like [1]. In the nutch code, > Ftp plugin is not parsing the robots file and simply accepting any url. > > In > "src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java" > > * public RobotRules getRobotRules(Text url, CrawlDatum datum) {* > * return EmptyRobotRules.RULES;* > * }* > > Was this done intentionally or is this a bug ? > > [0] : > > https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt > [1] : ftp://example.com/robots.txt > > Thanks, > Tejas Patil > -- *Lewis*

