On Mon, 09 Feb 2009 14:42:47 -0800, Marc Tompkins wrote: > Aha! My list of "magic words"! > (Sorry for the top post - anybody know how to change quoting defaults in > Android Gmail?) > --- www.fsrtechnologies.com > > On Feb 9, 2009 2:16 PM, "Dinesh B Vadhia" <dineshbvad...@hotmail.com> > wrote: > > Kent /Emmanuel > > I found a list of words before the first word that can be removed which > I think is the only way to successfully parse the citations. Here they > are: > > | E.g. | Accord | See |See + Also | Cf. | Compare | Contra | But + See | > But + Cf. | See Generally | Citing | In | >
I think the only reliable way to parse all the citations correctly, in the absence of "magic word" is to have a list of names. It involves a bit of manual work, but should be good enough if there are a small number of cases that is cited a lot of times. >>> names = '|'.join(['Carter', 'Jury Commision of Greene County', 'Lathe Turner', 'Fouche']) >>> rep = '|'.join(['.*?']) >>> dd = {'names': names, 'publ': rep} >>> re.search(r'((%(names)s) v. (%(names)s)(, [0-9]+ (%(publ)s) [0-9]+)* \([0-9]+\))' % dd, text).group() _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor