[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tejas Patil resolved NUTCH-802. ------------------------------- Resolution: Won't Fix Agree with Markus and Lewis. Hence marking this one as wont fix. If someone wishes to address it in the future they can open a new issue with the more appropriate solution. > Problems managing outlinks with large url length > ------------------------------------------------ > > Key: NUTCH-802 > URL: https://issues.apache.org/jira/browse/NUTCH-802 > Project: Nutch > Issue Type: Bug > Components: parser > Reporter: Pablo Aragón > Assignee: Andrzej Bialecki > Labels: nutch, outlink, parse, parseoutputformat > Fix For: 1.7 > > Attachments: ParseOutputFormat.patch > > > Nutch can get idle during the collection of outlinks if the URL address of > the outlink is too large. > The maximum sizes of an URL for the main web servers are: > * Apache: 4,000 bytes > * Microsoft Internet Information Server (IIS): 16, 384 bytes > * Perl HTTP::Daemon: 8.000 bytes > URL adress sizes bigger than 4000 bytes are problematic, so the limit should > be set in the nutch-default.xml configuration file. > I attached a patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira