Well coming up with this has made me really love Python. I worked on this with my online pythonpenpal Kyle, and here is what we came up with. Thanks to all for input so far.
My first idea was to use a C-type indexing for-loop, to grab a two- element sequence [i, i+1]: dict = {} for i in range(len(t) - 1): if not dict.has_key(t[i]): dict[t[i]] = {} if not dict[t[i]].has_key(t[i+1]): dict[t[i]][t[i+1]] = 1 else: dict[t[i]][t[i+1]] += 1 Which works, but. Kyle had an alternative take, with no indexing, and after we worked on this strategy it seemed very Pythonesque, and ran almost twice as fast. ---- #!/usr/local/bin/python import sys file = open(sys.argv[1], 'rb').read() # We imagine a 2-byte 'window' moving over the text from left to right # # +-------+ # L o n | d o | n . M i c h a e l m a s t e r m ... # +-------+ # # At any given point, we call the leftmost byte visible in the window 'L', and the # rightmost byte 'R'. # # +-----------+ # L o n | L=d R=o | n . M i c h a e l m a s t e r m ... # +-----------+ # # When the program begins, the first byte is preloaded into L, and we position R # at the second byte of the file. # dict = {} L = file[0] for R in file[1:]: # move right edge of window across the file if not L in dict: dict[L] = {} if not R in dict[L]: dict[L][R] = 1 else: dict[L][R] += 1 L = R # move character in R over to L # that's it. here's a printout strategy: for entry in dict: print entry, ':', sum(dict[entry].values()) print dict[entry] print ---- _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor