Thanks very much for your help. I did indeed neglect to put the "print" in the code that I sent to the list.
It appears that the step that is taking a long time, and that therefore makes me think that the script is somehow broken, is creating a dictionary of frequencies from the list of ngrams. To do this, I've written, for example: bigramDict = {} bigrams = [' '.join(wordlist[i:i+2]) for i in range(len(wordlist)-2+1)] for bigram in bigrams: if bigram in bigramDict.keys(): bigramDict[bigram] += 1 else: bigramDict[bigram] = 1 With around 500,000 bigrams, this is taking over 25 minutes to run (and I haven't sat around to let it finish) on an XP machine at 3.0GHz and 1.5GB RAM. I bet I'm trying to reinvent the wheel here, and that there are faster algorithms available in some package. I think possibly an indexing package like PyLucene would help create frequency dictionaries, but I can't figure it out from the online material available. Any suggestions? Thanks, Nick -----Original Message----- From: Jerry Hill [mailto:[EMAIL PROTECTED] Sent: Friday, March 16, 2007 12:52 PM To: Switanek, Nick Cc: tutor@python.org Subject: Re: [Tutor] fine in interpreter, hangs in batch On 3/16/07, Switanek, Nick <[EMAIL PROTECTED]> wrote: > After creating a list of words ('wordlist'), I can run the following > code in the interactive window of PythonWin in about ten seconds. If I > run the script containing the code, the script seems to hang on the > loop. I'd be grateful for help as to why; I often seem to have something > that works in the interpreter, but not when I run the script. I'm not sure what you mean by 'seems to hang'. The code that you posted isn't complete enough to run (since you didn't provide a definition of wordlist), and just generates a NameError exception. Beyond that, I don't understand what the code is supposed to produce for output. As written, you generate a list in your loop and assign it to the name ngrams, but never do anything with that list. Since you're inside a for loop, your ngrams name is overwritten the next time you run through the loop. You also generate a string with the statement "Finished the %d-gram list." % n, but you don't do anything with it. You probably want to either print it, or assign it to a variable to print later. Something like this: wordlist = ['apple', 'orange', 'pear', 'banana', 'coconut'] N = [2,3,4,5] ngramlist = [wordlist] for n in N: ngrams = [' '.join(wordlist[i:i+n]) for i in range(len(wordlist)-n+1)] print "Finished the %d-gram list." % n print ngrams produces the following output when run as a script: Finished the 2-gram list. ['apple orange', 'orange pear', 'pear banana', 'banana coconut'] Finished the 3-gram list. ['apple orange pear', 'orange pear banana', 'pear banana coconut'] Finished the 4-gram list. ['apple orange pear banana', 'orange pear banana coconut'] Finished the 5-gram list. ['apple orange pear banana coconut'] Does that help? -- Jerry _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor