On Sat, Jan 24, 2009 at 12:02 AM, Emad Nawfal (عماد نوفل) <emadnaw...@gmail.com> wrote: > Hello Tutors, > Arabic words are build around a root of 3 or 4 consonants with lots of > letters in between, and also prefixes and suffixes. > The root ktb (write) for example, could be found in words like: > ktab : book > mktob: letter, written > wktabhm: and their book > yktb: to write > lyktbha: in order for him to write it > > I need to find all the word forms made up of a certain root in a corpus. My > idea, which is not completely right, but nonetheless works most of the > time, is to find words that have the letters of the root in their > respective order. For example, the words that contain k followed by t > then followed by b, no matter whether there is something in between. I came > up with following which works fine. For learning purposes, please let me > know whether this is a good way, and how else I can achieve that. > I appreciate your help, as I always did. > > > > def getRoot(root, word): > result = "" > > for letter in word: > if letter not in root: > continue > result +=letter > return result > > # main > > infile = open("myCorpus.txt").read().split() > query = "ktb" > outcome = set([word for word in infile if query == getRoot(query, word)]) > for word in outcome: > > print(word)
This gets into problems if the letters of the root occur somewhere else in the word as well. For example, if there would be a word bktab, then getRoot("ktb","bktab") would be "bktb", not "ktb". I would use the find method of the string class here - if A and B are strings, and n is a number, then A.find(B,n) is the first location, starting at n, where B is a substring of A, or -1 if there isn't any. Using this, I get: def hasRoot(word, root): # This order I find more logical loc = 0 for letter in root: loc = word.find(letter) if loc == -1: return false return true # main infile = open("myCorpus.txt").read().split() query = "ktb" outcome = [word for word in infile if hasRoot(word,query)] for word in outcome: print(word) -- André Engels, andreeng...@gmail.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor