Thanks very much for your help.

I did indeed neglect to put the "print" in the code that I sent to the

It appears that the step that is taking a long time, and that therefore
makes me think that the script is somehow broken, is creating a
dictionary of frequencies from the list of ngrams. To do this, I've
written, for example:

bigramDict = {}
bigrams = [' '.join(wordlist[i:i+2]) for i in range(len(wordlist)-2+1)]
for bigram in bigrams:
        if bigram in bigramDict.keys(): bigramDict[bigram] += 1
        else: bigramDict[bigram] = 1

With around 500,000 bigrams, this is taking over 25 minutes to run (and
I haven't sat around to let it finish) on an XP machine at 3.0GHz and
1.5GB RAM. I bet I'm trying to reinvent the wheel here, and that there
are faster algorithms available in some package. I think possibly an
indexing package like PyLucene would help create frequency dictionaries,
but I can't figure it out from the online material available. Any


-----Original Message-----
From: Jerry Hill [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 16, 2007 12:52 PM
To: Switanek, Nick
Subject: Re: [Tutor] fine in interpreter, hangs in batch

On 3/16/07, Switanek, Nick <[EMAIL PROTECTED]> wrote:
> After creating a list of words ('wordlist'), I can run the following
> code in the interactive window of PythonWin in about ten seconds. If I
> run the script containing the code, the script seems to hang on the
> loop. I'd be grateful for help as to why; I often seem to have
> that works in the interpreter, but not when I run the script.

I'm not sure what you mean by 'seems to hang'.  The code that you
posted isn't complete enough to run (since you didn't provide a
definition of wordlist), and just generates a NameError exception.

Beyond that, I don't understand what the code is supposed to produce
for output.  As written, you generate a list in your loop and assign
it to the name ngrams, but never do anything with that list.  Since
you're inside a for loop, your ngrams name is overwritten the next
time you run through the loop.  You also generate a string with the
statement "Finished the %d-gram list." % n, but you don't do anything
with it.  You probably want to either print it, or assign it to a
variable to print later.

Something like this:

wordlist = ['apple', 'orange', 'pear', 'banana', 'coconut']
N = [2,3,4,5]
ngramlist = [wordlist]
for n in N:
   ngrams = [' '.join(wordlist[i:i+n]) for i in
   print "Finished the %d-gram list." % n
   print ngrams

produces the following output when run as a script:

Finished the 2-gram list.
['apple orange', 'orange pear', 'pear banana', 'banana coconut']
Finished the 3-gram list.
['apple orange pear', 'orange pear banana', 'pear banana coconut']
Finished the 4-gram list.
['apple orange pear banana', 'orange pear banana coconut']
Finished the 5-gram list.
['apple orange pear banana coconut']

Does that help?

Tutor maillist  -

Reply via email to