Sounds great, Arkadi (isAnySuccess()). Please submit a pull request and/or patch when you get a chance. This sounds like a needed change for sure.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "arkadi.kosmy...@csiro.au" <arkadi.kosmy...@csiro.au> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> Date: Tuesday, April 21, 2015 at 12:20 AM To: "user@nutch.apache.org" <user@nutch.apache.org> Subject: RE: A bug in org.apache.nutch.parse.ParseUtil? >Hi Sebastian, > >Yes, I considered parseResult.isSuccess(), but the problem is, it returns >success only if all parses were successful. So, if the first parser >succeeds, it will break the loop, else all parsers will be used - I don't >think this was the idea. > >If retaining ParseStatus of failed parses is important, perhaps a similar >isAnySuccess() function could help. > >Regards, > >Arkadi > >-----Original Message----- >From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] >Sent: Saturday, 18 April 2015 7:37 AM >To: user@nutch.apache.org >Subject: Re: A bug in org.apache.nutch.parse.ParseUtil? > >Hi Arkadi, > >agreed that's a bug. > >> if ( parseResult != null ) parseResult.filter() ; > >parseResult.isSuccess() > would do the check without modifying the ParseResult > >In case, that also fall-back parsers fail it could useful to return one >(the first? the last?) failed ParseResult. Luckily the parser places a >meaningful error message or minor ParseStatus which could be used by the >caller for diagnostics. > >Thanks, >Sebastian > >On 04/17/2015 06:31 AM, arkadi.kosmy...@csiro.au wrote: >> Hi, >> >> From reading the code it is clear that it is designed to allow using >> several parsers to parse a document in a sequence, until it is >> successfully parsed. In practice, this does not work because these >> lines >> >> f (parseResult != null && !parseResult.isEmpty()) >> return parseResult; >> >> break the loop even if the parsing has failed because parseResult is >>not empty anyway, it contains a ParseData with ParseStatus.FAILED. >> This is easy to fix, for example, by adding a line before the two lines >>mentioned above: >> >> if ( parseResult != null ) parseResult.filter() ; >> >> This will remove failed ParseData objects from the parseResult and >>leave it empty if the parsing had been unsuccessful. I believe that this >>fix is important because it allows use of backup parsers as originally >>designed and thus increase index completeness. >> >> Regards, >> Arkadi >> >> >> >