Emad Nawfal (عمـ نوفل ـاد) wrote:
Dear Tutors,
The purpose of this script is to see how many vocalized forms map to a single consonantal form. For example, the form "fn" could be fan, fin, fun.

The input is a large list (taken from a file) that has ordinary words.

What does "ordinary" mean.

Are all the letteers lower case?

The script creates a devocalized list, then compares the two lists.

The problem: It takes over an hour to process 1 file. The average file size is 400,000 words.

Question: How can I make it run faster? I have a large number of files.

The major consumer of time is the "outer product:" (for loop with list comprehension). That is 400,000 squared operations!
And it is not necessary!

Also the program repeatedly call devocalize for the same word!


Note: I'm not a programmer, so please avoid very technical terms.

You are writing a program. Does not that make you a programmer? How can I tell what terms are too technical for you?


Thank you in anticipation.

You are welcome in retrospect. Observations follow:


def devocalize(word):
    vowels = "aiou"
    return "".join([letter for letter in word if letter not in vowels])

What happenex to"e"?

Use translate to delete the vowels. That will improve speed.

"translate( table[, deletechars])

Return a copy of the string where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256."

Here is a program that will be much faster. I changed names to be more meaningful (to me), and created a main function (standard operating procedure in Python).

import collections

def devowel(word, table=''.join(chr(i) for i in range(256)), vowels="aiou"):
 return word.translate(table, vowels)

def main():
 allOriginalWords = ['him', 'ham', 'hum', 'fun', 'fan']
 uniqueOriginalWords = set(allOriginalWords)
 wordDict = collections.defaultdict(list)
 for word in uniqueOriginalWords:
   wordDict[devowel(word)].append(word)
 for lex, wordList in wordDict.iteritems():
   print lex, " ".join(wordList)

main()

--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to