Hi Sebastian, Yes, I considered parseResult.isSuccess(), but the problem is, it returns success only if all parses were successful. So, if the first parser succeeds, it will break the loop, else all parsers will be used - I don't think this was the idea.
If retaining ParseStatus of failed parses is important, perhaps a similar isAnySuccess() function could help. Regards, Arkadi -----Original Message----- From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] Sent: Saturday, 18 April 2015 7:37 AM To: user@nutch.apache.org Subject: Re: A bug in org.apache.nutch.parse.ParseUtil? Hi Arkadi, agreed that's a bug. > if ( parseResult != null ) parseResult.filter() ; parseResult.isSuccess() would do the check without modifying the ParseResult In case, that also fall-back parsers fail it could useful to return one (the first? the last?) failed ParseResult. Luckily the parser places a meaningful error message or minor ParseStatus which could be used by the caller for diagnostics. Thanks, Sebastian On 04/17/2015 06:31 AM, arkadi.kosmy...@csiro.au wrote: > Hi, > > From reading the code it is clear that it is designed to allow using > several parsers to parse a document in a sequence, until it is > successfully parsed. In practice, this does not work because these > lines > > f (parseResult != null && !parseResult.isEmpty()) > return parseResult; > > break the loop even if the parsing has failed because parseResult is not > empty anyway, it contains a ParseData with ParseStatus.FAILED. > This is easy to fix, for example, by adding a line before the two lines > mentioned above: > > if ( parseResult != null ) parseResult.filter() ; > > This will remove failed ParseData objects from the parseResult and leave it > empty if the parsing had been unsuccessful. I believe that this fix is > important because it allows use of backup parsers as originally designed and > thus increase index completeness. > > Regards, > Arkadi > > >