Hi Tutors, I want to color-code the different parts of the word in a morphologically complex natural language. The file I have looks like this, where the fisrt column is the word, and the second is the composite part of speech tag. For example, Al is a DETERMINER, wlAy is a NOUN and At is a PLURAL NOUN SUFFIX
Al+wlAy+At DET+NOUN+NSUFF_FEM_PL Al+mtHd+p DET+ADJ+NSUFF_FEM_SG The output I want is one on which the word has no plus signs, and each segment is color-coded with a grammatical category. For example, the noun is red, the det is green, and the suffix is orange. Like on this page here: http://docs.google.com/View?id=df7jv9p9_3582pt63cc4 I am stuck with the html part and I don't know where to start. I have no experience with html, but I have this skeleton (which may not be the right thing any way) Any help with materials, modules, suggestions appreciated. This skeleton of my program is as follows: ############# RED = ("NOUN", "ADJ") GREEN = ("DET", "DEMON") ORANGE = ("NSUFF", "VSUFF", "ADJSUFF") # print html head def print_html_head(): #print the head of the html page def print_html_tail(): # print the tail of the html page def color(segment, color): # STUCK HERE shoudl take a color, and a segment for example # main import sys infile = open(sys.argv[1]) # takes as input the POS-tagged file print_html_head() for line in infile: line = line.split() if len(line) != 2: continue word = line[0] pos = line[1] zipped = zip(word.split("+"), pos.split("+")) for x, y in zipped: if y in DET: color(x, "#FF0000") else: color(x, "#0000FF") print_html_tail() -- لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي "No victim has ever been more repressed and alienated than the truth" Emad Soliman Nawfal Indiana University, Bloomington --------------------------------------------------------
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor