"Emad Nawfal (عمـ نوفل ـاد)" <emadnaw...@gmail.com> wrote

def devocalize(word):
    vowels = "aiou"
Should this include 'e'?
    return "".join([letter for letter in word if letter not in vowels])

Its probably faster to use a regular expression replacement.
Simply replace any vowel with the empty string.

vowelled = ['him', 'ham', 'hum', 'fun', 'fan'] # input, usually a large list
of around 500,000 items
vowelled = set(vowelled)


How do you process the file? Do you read it all into memory and
then convert it to a set? Or do you process each line (one word
per line?) and add the words to the set one by one? The latter
is probably faster.

unvowelled = set([devocalize(word) for word in vowelled])
for lex in unvowelled:
    d = {}
   d[lex] = [word for word in vowelled if devocalize(word) == lex]

I think you could remove the comprehensions and do all of
this inside a single loop. One of those cases where a single
explicit loop is faster than 2 comprehesions and a loop.

But the only way to be sure is to test/profile to see whee the slowdown occurs.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to