On Tue, 2010-08-10 at 19:24 -0700, jdow wrote: > From: "Martin Gregorie" <mar...@gregorie.org> > Sent: Monday, 2010/August/09 18:08 > > > > On Mon, 2010-08-09 at 17:42 -0700, jdow wrote: > >> From: "Martin Gregorie" <mar...@gregorie.org> > >> > Something like this will match a sequence of two capitalised name > >> > words, > >> > including hyphenated ones, and extract the name words: > >> > > >> > /([A-Z][-a-zA-Z]{1,20})\s([A-Z][-a-zA-Z]{1,20})/ > >> > > >> > and should be fairly easy to extend to deal with initials and/or more > >> > than one forename. Tested in Python and should also work in Perl. > >> > > >> > >> That solves the Reginald Slovotniksky type names. But, "John Smith"? > >> Dunno. > >> > > The regex I showed will return 'John' and 'Smith' so the combo can be > > queried in the database, which is all I set out to try. However, I was > > trying to generalise as a regex that would match two or more Capitalised > > Names and return them as an array of group values but I couldn't work > > out how to do that without writing a rather tedious set of ever longer > > alternates. If anybody knows how to do that without resorting to > > alternatives I'd be fascinated to know how you do that. > > Ah, but Martin, do you really know if it is the John Smith in the database > or > another John Smith? > No, but then nobody knows that. If you're scanning for patient names in body text and a common name happens to match a patient its an ambiguous situation that can only be resolved iff you can write a rule that reliably disambiguate it by recognising the name's context.
Martin