On Fri, Jan 2, 2009 at 5:34 AM, Alan Gauld <alan.ga...@btinternet.com> wrote:
> I think the figures reflect the general level of activity on the list. > We seem to have peaked in 2005... > Statistics, don't you love 'em :-) I couldn't resist adding a total number of posts and percent to the calculations. Statistics + python = time sink :-) I re-ran the program back to 2003. New program and results below. 2005 was a banner year. 2008 was down considerably from 2007 and that does account for our smaller numbers. BTW your historical counts are up a bit in this set because this is the first year I had the name folding. Maybe I should add a set of known aliases also... Kent ''' Counts all posts to Python-tutor by author''' # -*- coding: latin-1 -*- from datetime import date, timedelta import operator, urllib2 from BeautifulSoup import BeautifulSoup today = date.today() for year in range(2003, 2009): startDate = date(year, 1, 1) endDate = date(year, 12, 31) thirtyOne = timedelta(days=31) counts = {} # Collect all the counts for a year by scraping the monthly author archive pages while startDate < endDate and startDate < today: dateString = startDate.strftime('%Y-%B') url = 'http://mail.python.org/pipermail/tutor/%s/author.html' % dateString data = urllib2.urlopen(url).read() soup = BeautifulSoup(data) li = soup.findAll('li')[2:-2] for l in li: name = l.i.string.strip() counts[name] = counts.get(name, 0) + 1 startDate += thirtyOne totalPosts = sum(counts.itervalues()) # Consolidate names that vary by case under the most popular spelling nameMap = dict() # Map lower-case name to most popular name for name, count in sorted(counts.iteritems(), key=operator.itemgetter(1), reverse=True): lower = name.lower() if lower in nameMap: # Add counts for a name we have seen already counts[nameMap[lower]] += count else: nameMap[lower] = name print print '%s (%s posts)' % (year, totalPosts) print '====' for name, count in sorted(counts.iteritems(), key=operator.itemgetter(1), reverse=True)[:20]: pct = round(100.0*count/totalPosts, 1) print '%s %s (%s%%)' % (name.encode('utf-8', 'xmlcharrefreplace'), count, pct) print # Results as of 12/31/2008: ''' 2003 (7745 posts) ==== Danny Yoo 617 (8.0%) Alan Gauld 421 (5.4%) Jeff Shannon 283 (3.7%) Magnus Lycka 242 (3.1%) Bob Gailer 195 (2.5%) Magnus =?iso-8859-1?Q?Lyck=E5?= 166 (2.1%) alan.ga...@bt.com 161 (2.1%) Kirk Bailey 155 (2.0%) Gregor Lingl 152 (2.0%) Lloyd Kvam 142 (1.8%) Andrei 118 (1.5%) Sean 'Shaleh' Perry 117 (1.5%) Magnus Lyckå 113 (1.5%) Michael Janssen 113 (1.5%) Erik Price 100 (1.3%) Lee Harr 88 (1.1%) Terry Carroll 87 (1.1%) Daniel Ehrenberg 78 (1.0%) Abel Daniel 76 (1.0%) Don Arnold 75 (1.0%) 2004 (7178 posts) ==== Alan Gauld 699 (9.7%) Danny Yoo 530 (7.4%) Kent Johnson 451 (6.3%) Lloyd Kvam 146 (2.0%) Dick Moores 145 (2.0%) Liam Clarke 140 (2.0%) Brian van den Broek 122 (1.7%) Karl Pflästerer 109 (1.5%) Jacob S. 101 (1.4%) Andrei 99 (1.4%) Chad Crabtree 93 (1.3%) Bob Gailer 91 (1.3%) Magnus Lycka 91 (1.3%) Terry Carroll 88 (1.2%) Marilyn Davis 84 (1.2%) Gregor Lingl 73 (1.0%) Dave S 73 (1.0%) Bill Mill 71 (1.0%) Isr Gish 71 (1.0%) Lee Harr 67 (0.9%) 2005 (9705 posts) ==== Kent Johnson 1189 (12.3%) Danny Yoo 767 (7.9%) Alan Gauld 565 (5.8%) Alan G 317 (3.3%) Liam Clarke 298 (3.1%) Max Noel 203 (2.1%) Nathan Pinno 197 (2.0%) Brian van den Broek 190 (2.0%) Jacob S. 154 (1.6%) jfouhy at paradise.net.nz 135 (1.4%) Alberto Troiano 128 (1.3%) Bernard Lebel 119 (1.2%) Joseph Quigley 101 (1.0%) Terry Carroll 93 (1.0%) Andrei 79 (0.8%) D. Hartley 77 (0.8%) John Fouhy 73 (0.8%) bob 73 (0.8%) Hugo González Monteverde 72 (0.7%) Orri Ganel 69 (0.7%) 2006 (7521 posts) ==== Kent Johnson 913 (12.1%) Alan Gauld 821 (10.9%) Danny Yoo 448 (6.0%) Luke Paireepinart 242 (3.2%) John Fouhy 187 (2.5%) Chris Hengge 166 (2.2%) Bob Gailer 134 (1.8%) Dick Moores 129 (1.7%) Asrarahmed Kadri 119 (1.6%) Terry Carroll 111 (1.5%) Python 94 (1.2%) Mike Hansen 74 (1.0%) Liam Clarke 72 (1.0%) Carroll, Barry 67 (0.9%) Kermit Rose 66 (0.9%) anil maran 66 (0.9%) Hugo González Monteverde 65 (0.9%) wesley chun 63 (0.8%) Dave S 58 (0.8%) Christopher Spears 53 (0.7%) 2007 (7600 posts) ==== Kent Johnson 1052 (13.8%) Alan Gauld 977 (12.9%) Luke Paireepinart 260 (3.4%) Dick Moores 203 (2.7%) Eric Brunson 164 (2.2%) Bob Gailer 144 (1.9%) Terry Carroll 128 (1.7%) Tiger12506 112 (1.5%) John Fouhy 105 (1.4%) Ricardo Aráoz 93 (1.2%) Rikard Bosnjakovic 93 (1.2%) bhaaluu 88 (1.2%) elis aeris 83 (1.1%) Andreas Kostyrka 77 (1.0%) Michael Langford 68 (0.9%) shawn bright 63 (0.8%) Tim Golden 62 (0.8%) Dave Kuhlman 62 (0.8%) wormwood_3 53 (0.7%) wesley chun 53 (0.7%) 2008 (6624 posts) ==== Kent Johnson 931 (14.1%) Alan Gauld 820 (12.4%) bob gailer 247 (3.7%) Dick Moores 191 (2.9%) W W 142 (2.1%) Wayne Watson 106 (1.6%) John Fouhy 97 (1.5%) Steve Willoughby 91 (1.4%) Lie Ryan 88 (1.3%) bhaaluu 85 (1.3%) Marc Tompkins 83 (1.3%) Michael Langford 71 (1.1%) Tiger12506 70 (1.1%) Andreas Kostyrka 64 (1.0%) Dinesh B Vadhia 64 (1.0%) wesley chun 58 (0.9%) Tim Golden 57 (0.9%) Chris Fuller 54 (0.8%) Ricardo Aráoz 53 (0.8%) spir 53 (0.8%) ''' _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor