For several years I have been using a simple script to find the top 20 posters to the tutor list by web-scraping the archive pages. I thought others might be interested so here is the list for 2008 and the script that generates it. The lists for previous years (back to 2003) are at the end so everyone on the list doesn't hit the archives to find out :-)
The script gives a simple example of datetime, urllib2 and BeautifulSoup. It consolidates names that vary by case but other variations are not detected. Alan, I thought you might have passed me this year but we are both off a little :-) Somehow I have posted an average of 2.8 times per day for the last four years... Happy New Year everyone! Kent 2008 ==== Kent Johnson 931 Alan Gauld 820 bob gailer 247 Dick Moores 191 W W 142 Wayne Watson 106 John Fouhy 97 Steve Willoughby 91 Lie Ryan 88 bhaaluu 85 Marc Tompkins 83 Michael Langford 71 Tiger12506 70 Andreas Kostyrka 64 Dinesh B Vadhia 64 wesley chun 58 Tim Golden 57 Chris Fuller 54 Ricardo Aráoz 53 spir 53 ##################################### ''' Counts all posts to Python-tutor by author''' # -*- coding: latin-1 -*- from datetime import date, timedelta import operator, urllib2 from BeautifulSoup import BeautifulSoup today = date.today() for year in [2008]: startDate = date(year, 1, 1) endDate = date(year, 12, 31) thirtyOne = timedelta(days=31) counts = {} # Collect all the counts for a year by scraping the monthly author archive pages while startDate < endDate and startDate < today: dateString = startDate.strftime('%Y-%B') url = 'http://mail.python.org/pipermail/tutor/%s/author.html' % dateString data = urllib2.urlopen(url).read() soup = BeautifulSoup(data) li = soup.findAll('li')[2:-2] for l in li: name = l.i.string.strip() counts[name] = counts.get(name, 0) + 1 startDate += thirtyOne # Consolidate names that vary by case under the most popular spelling nameMap = dict() # Map lower-case name to most popular name for name, count in sorted(counts.iteritems(), key=operator.itemgetter(1), reverse=True): lower = name.lower() if lower in nameMap: # Add counts for a name we have seen already counts[nameMap[lower]] += count else: nameMap[lower] = name print print year print '====' for name, count in sorted(counts.iteritems(), key=operator.itemgetter(1), reverse=True)[:20]: print name.encode('latin-1', 'xmlcharrefreplace'), count print # Results as of 12/31/2008: ''' 2003 ==== Danny Yoo 617 Alan Gauld 421 Jeff Shannon 283 Magnus Lycka 242 Bob Gailer 195 Magnus =?iso-8859-1?Q?Lyck=E5?= 166 alan.ga...@bt.com 161 Kirk Bailey 155 Gregor Lingl 152 Lloyd Kvam 142 Andrei 118 Sean 'Shaleh' Perry 117 Magnus Lyckå 113 Michael Janssen 113 Erik Price 100 Lee Harr 88 Terry Carroll 87 Daniel Ehrenberg 78 Abel Daniel 76 Charlie Clark 74 2004 ==== Alan Gauld 699 Danny Yoo 530 Kent Johnson 451 Lloyd Kvam 146 Dick Moores 145 Liam Clarke 140 Brian van den Broek 122 Karl Pflästerer 109 Jacob S. 101 Andrei 99 Chad Crabtree 93 Bob Gailer 91 Magnus Lycka 91 Terry Carroll 88 Marilyn Davis 84 Gregor Lingl 73 Dave S 73 Bill Mill 71 Isr Gish 71 Lee Harr 67 2005 ==== Kent Johnson 1189 Danny Yoo 767 Alan Gauld 565 Alan G 317 Liam Clarke 298 Max Noel 203 Nathan Pinno 197 Brian van den Broek 190 Jacob S. 154 jfouhy at paradise.net.nz 135 Alberto Troiano 128 Bernard Lebel 119 Joseph Quigley 101 Terry Carroll 93 Andrei 79 D. Hartley 77 John Fouhy 73 bob 73 Hugo González Monteverde 72 Orri Ganel 69 2006 ==== Kent Johnson 913 Alan Gauld 815 Danny Yoo 448 Luke Paireepinart 242 John Fouhy 187 Chris Hengge 166 Bob Gailer 134 Dick Moores 129 Asrarahmed Kadri 119 Terry Carroll 111 Python 94 Mike Hansen 74 Liam Clarke 72 Carroll, Barry 67 Kermit Rose 66 anil maran 66 Hugo González Monteverde 65 wesley chun 63 Christopher Spears 53 Michael Lange 51 2007 ==== Kent Johnson 1052 Alan Gauld 938 Luke Paireepinart 260 Dick Moores 203 Eric Brunson 164 Terry Carroll 128 Tiger12506 112 John Fouhy 105 Bob Gailer 97 Ricardo Aráoz 93 Rikard Bosnjakovic 93 bhaaluu 88 elis aeris 83 Andreas Kostyrka 77 Michael Langford 68 shawn bright 63 Tim Golden 62 Dave Kuhlman 62 wormwood_3 53 wesley chun 53 2008 ==== Kent Johnson 931 Alan Gauld 820 bob gailer 247 Dick Moores 191 W W 142 Wayne Watson 106 John Fouhy 97 Steve Willoughby 91 Lie Ryan 88 bhaaluu 85 Marc Tompkins 83 Michael Langford 71 Tiger12506 70 Andreas Kostyrka 64 Dinesh B Vadhia 64 wesley chun 58 Tim Golden 57 Chris Fuller 54 Ricardo Aráoz 53 spir 53 ''' _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor