[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki reopened NUTCH-696: ------------------------------------- This may be useful after all - let's gather more comments. > Timeout for Parser > ------------------ > > Key: NUTCH-696 > URL: https://issues.apache.org/jira/browse/NUTCH-696 > Project: Nutch > Issue Type: Wish > Components: fetcher > Reporter: Julien Nioche > Priority: Minor > Attachments: timeout.patch > > > I found that the parsing sometimes crashes due to a problem on a specific > document, which is a bit of a shame as this blocks the rest of the segment > and Hadoop ends up finding that the node does not respond. I was wondering > about whether it would make sense to have a timeout mechanism for the parsing > so that if a document is not parsed after a time t, it is simply treated as > an exception and we can get on with the rest of the process. > Does that make sense? Where do you think we should implement that, in > ParseUtil? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.