(Please disregard my earlier message that was sent by mistake before I finished composing. Sorry about that! :().
Hello Spir, Alan, and Paul, and tutors, Thank you Spir, Alan, and Paul for your help with my previous code! Earlier, I was asking how to separate a composite tag like the one in field 2 below with sub-tags like those in the values of the dictionary below. In my original question, I was asking about data formatted as follows: w1 \t case_def_acc w2 \t noun_prop w3 \t case_def_gen w4 \t dem_pron_f And I put together the code below based on your suggestions, with minor changes and it does work. -------------Begin code---------------------------- #!usr/bin/python tags = { 'noun-prop': 'noun_prop null null'.split(), 'case_def_gen': 'case_def gen null'.split(), 'dem_pron_f': 'dem_pron f null'.split(), 'case_def_acc': 'case_def acc null'.split(), } TAB = '\t' def newlyTaggedWord(line): line = line.rstrip() # I strip line ending (word,tag) = line.split(TAB) # separate parts of line, keeping data only new_tags = tags[tag] # read in dict tagging = TAB.join(new_tags) # join with TABs return word + TAB + tagging # formatted result def replaceTagging(source_name, target_name): target_file = open(target_name, "w") # replacement loop for line in open(source_name, "r"): new_line = newlyTaggedWord(line) + '\n' target_file.write(new_line) source_name.close() target_file.close() if __name__ == "__main__": source_name = sys.argv[1] target_name = sys.argv[2] replaceTagging(source_name, target_name) -------------End code---------------------------- Now since I have to workon different data format as follows: -------------Begin data---------------------------- w1 \t case_def_acc \t yes w2 \t noun_prop \t no w3 \t case_def_gen \t w4 \t dem_pron_f \t no w3 \t case_def_gen \t w4 \t dem_pron_f \t no w1 \t case_def_acc \t yes w3 \t case_def_gen \t w3 \t case_def_gen \t -------------End data---------------------------- Notices that some lines have nothing in yes-no filed, and hence end in a tab. My question is how to replace data in the filed of composite tags by sub-tags like those in the dictionary values above and still be able to print the whole line only with this change (i.e, composite tags replace by sub-tags). Earlier, we read words and tags from line directly into the dictionary since we were sure each line had 2 fields after separating by tabs. Here, lines have various field lengths and sometimes have yes and no finally, and sometimes not. I tried to make changes to the code above by changing the function where we read the dictionary, but it did not work. While it is ugly, I include it as a proof that I have worked on the problem. I am sure you will have various nice ideas. -------------End code---------------------------- def newlyTaggedWord(line): tagging = "" line = line.split(TAB) # separate parts of line, keeping data only if len(line)==3: word = line[-3] tag = line[-2] new_tags = tags[tag] decision = line[-1] # in decision I wanted to store #either yes or no if one of #these existed elif len(line)==2: word = line[-2] tag = line[-1] decision = TAB # I thought if it is a must to put sth in decision while decision #is really absent in line, I would put a tab. But I really want to #avoid putting anything there. new_tags = tags[tag] # read in dict tagging = TAB.join(new_tags) # join with TABs return word + TAB + tagging + TAB + decision -------------End code---------------------------- I appreciate your support! --dan
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor