On Sat, Feb 7, 2009 at 11:53 AM, Dinesh B Vadhia <dineshbvad...@hotmail.com> wrote: > Wow Kent, what a great start! > > I found this > http://mail.python.org/pipermail/python-list/2006-April/376149.html which > lays out some patterns of legal citations ie.
Here is another good reference: http://philip.greenspun.com/politics/litigation/reading-cites.html > 1. Two names, consisting of one or more words, separated by a "v." > 2. One, two, or three citations, each of which has a volume number ("90") > followed by a Reporter name ("U.S." or "S.Ct." or "L.Ed."), which consists > of one or two words always ending with a ".", followed by a page number > ("1893") According to the reference I cite above, the Reporter name does not have to include periods, his examples include US and Tenn as reporters. > 3. Each citation may contain a comma and a second page number (", 234 ") > 4. Optionally, a parenthesized year ("(1970)") or optional information in > parentheses ("(DCMD Ala.1966)") > 5. An ending "." Or comma; this seems to be a grammatical element of the enclosing sentence rather than part of the citation. > I was pondering the same issue about names ie. how do you know that "Page > 500" is not part of "Carter". My thought was to start from the "v.", step > backwards a word at a time, assume that the first name is valid, for all > subsequent words check if the last character of a word contained the digits > [0-9] or these punctuation marks [.,:;], if so, then it was unlikely to be > part of the name. That won't help with "In John Doggone Williams". I can imagine a name with punctuation, also, e.g. St. John's Lumber. > > I've changed the sample text to include examples of multiple page > references: Actually you had one already. > Okay, I'd better get to grips with pyparsing! Pyparsing won't backtrack which may be a disadvantage here in parsing the extra page numbers. Here is some comment: http://mail.python.org/pipermail/python-list/2007-November/464726.html Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor