Emad Nawfal (عماد نوفل) wrote:
Hi Tutors,
I want to color-code the different parts of the word in a morphologically complex natural language. The file I have looks like this, where the fisrt column is the word, and the second is the composite part of speech tag. For example, Al is a DETERMINER, wlAy is a NOUN and At is a PLURAL NOUN SUFFIX

Al+wlAy+At        DET+NOUN+NSUFF_FEM_PL
Al+mtHd+p        DET+ADJ+NSUFF_FEM_SG

The output I want is one on which the word has no plus signs, and each segment is color-coded with a grammatical category. For example, the noun is red, the det is green, and the suffix is orange. Like on this page here:
http://docs.google.com/View?id=df7jv9p9_3582pt63cc4
I am stuck with the html part and I don't know where to start. I have no experience with html, but I have this skeleton (which may not be the right thing any way)
Any help with materials, modules, suggestions appreciated.

This skeleton of my program is as follows:

#############
RED = ("NOUN", "ADJ")
GREEN = ("DET", "DEMON")
ORANGE = ("NSUFF", "VSUFF", "ADJSUFF")

Instead of that use a dictionary:

colors = dict(NOUN="RED", ADJ="RED",DET ="GREEn",DEMON ="GREEN",
                     NSUFF="ORANGE", VSUFF="ORANGE", ADJSUFF="ORANGE")
# print html head
def print_html_head():
    #print the head of the html page
def print_html_tail():
   # print the tail of the html page

def color(segment, color):
   # STUCK HERE shoudl take a color, and a segment for example

# main
import sys
infile = open(sys.argv[1]) # takes as input the POS-tagged file
print_html_head()
for line in infile:
    line = line.split()
    if len(line) != 2: continue
    word = line[0]
    pos = line[1]
    zipped = zip(word.split("+"), pos.split("+"))
for x, y in zipped:
        if y in DET:
            color(x, "#FF0000")
        else:
            color(x, "#0000FF")
print_html_tail()



--
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
------------------------------------------------------------------------

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to