On Tue, 2010-08-10 at 19:24 -0700, jdow wrote:
> From: "Martin Gregorie" <mar...@gregorie.org>
> Sent: Monday, 2010/August/09 18:08
> 
> 
> > On Mon, 2010-08-09 at 17:42 -0700, jdow wrote:
> >> From: "Martin Gregorie" <mar...@gregorie.org>
> >> > Something like this will match a sequence of two capitalised name 
> >> > words,
> >> > including hyphenated ones, and extract the name words:
> >> >
> >> > /([A-Z][-a-zA-Z]{1,20})\s([A-Z][-a-zA-Z]{1,20})/
> >> >
> >> > and should be fairly easy to extend to deal with initials and/or more
> >> > than one forename. Tested in Python and should also work in Perl.
> >> >
> >>
> >> That solves the Reginald Slovotniksky type names. But, "John Smith"? 
> >> Dunno.
> >>
> > The regex I showed will return 'John' and 'Smith' so the combo can be
> > queried in the database, which is all I set out to try. However, I was
> > trying to generalise as a regex that would match two or more Capitalised
> > Names and return them as an array of group values but I couldn't work
> > out how to do that without writing a rather tedious set of ever longer
> > alternates. If anybody knows how to do that without resorting to
> > alternatives I'd be fascinated to know how you do that.
> 
> Ah, but Martin, do you really know if it is the John Smith in the database 
> or
> another John Smith?
> 
No, but then nobody knows that. If you're scanning for patient names in
body text and a common name happens to match a patient its an ambiguous
situation that can only be resolved iff you can write a rule that
reliably disambiguate it by recognising the name's context. 

Martin


Reply via email to