Hi,
I'm learning Python so I can take advantage of the really cool stuff in the
Natural Language Toolkit. But I'm having problems with some basic file
manipulation stuff.
My basic question: How do I read data in from a csv, manipulate it, and then
add it back to the csv in new columns (keeping the manipulated data in the
"right row")?
Here's an example of what my data looks like ("test-8-29-10.csv"):
MyWord
Category
Ct
CatCt
!
A
2932
456454
!
B
2109
64451
a
C
7856
90000
a
A
19911
456454
abnormal
C
174
90000
abnormally
D
5
77777
cats
E
1999
886454
cat
B
160
64451
# I want to read in the MyWord for each row and then do some stuff to it and
add in some new columns. Specifically, I want to "lemmatize" and "stem", which
basically means I'll turn "abnormally" into "abnormal" and "cats" into "cat".
import nltk
wnl=nltk.WordNetLemmatizer()
porter=nltk.PorterStemmer()
text=nltk.word_tokenize(TheStuffInMyWordColumn)
textlemmatized=[wnl.lemmatize(t) for t in text]
textPort=[porter.stem(t) for t in text]
# This creates the right info, but I don't really want "textlemmatized" and
"textPort" to be independent lists, I want them inside the csv in new columns.
# If I didn't want to keep the information in the Category and Counts columns,
I would probably do something like this:
for word in text:
word2=wnl.lemmatize(word)
word3=porter.stem(word)
print word+";"+word2+";"+word3+"\r\n")
# Looking through some of the older discussions about the csv module, I found
this code helps identify headers, but I'm still not sure how to use them--or
how to word the for-loop that I need correctly so I iterate through each row in
the csv file.
f_out.close()
fp=open(r'c:test-8-29-10.csv', 'r')
inputfile=csv.DictReader(fp)
for record in inputfile:
print record
{'Category': 'A', 'CatCt': '456454', 'MyWord': '!', 'Ct': '2932'}
{'Category': 'B', 'CatCt': '64451', 'MyWord': '!', 'Ct': '2109'}
...
fp.close()
# So I feel like I have *some* of the pieces, but I'm just missing a bunch of
little connections. Any and all help would be much appreciated!
Tyler
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor