"Emad Nawfal (عمـ نوفل ـاد)" <emadnaw...@gmail.com> wrote
def devocalize(word):
vowels = "aiou"
Should this include 'e'?
return "".join([letter for letter in word if letter not in vowels])
Its probably faster to use a regular expression replacement.
Simply replace any vowel with the empty string.
vowelled = ['him', 'ham', 'hum', 'fun', 'fan'] # input, usually a large
list
of around 500,000 items
vowelled = set(vowelled)
How do you process the file? Do you read it all into memory and
then convert it to a set? Or do you process each line (one word
per line?) and add the words to the set one by one? The latter
is probably faster.
unvowelled = set([devocalize(word) for word in vowelled])
for lex in unvowelled:
d = {}
d[lex] = [word for word in vowelled if devocalize(word) == lex]
I think you could remove the comprehensions and do all of
this inside a single loop. One of those cases where a single
explicit loop is faster than 2 comprehesions and a loop.
But the only way to be sure is to test/profile to see whee the slowdown
occurs.
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor