Okay, so I'm trying to use Doc2Vec to simply read in a a file that is a list of sentences like this: #
The elephant flaps its large ears to cool the blood in them and its body. A house is a permanent building or structure for people or families to live in. .... # What I want to do is generate two files one with unique words from these sentences and another file that has one corresponding vector per line (if theres no vector output I want to output a vector od 0's) I'm getting the vocab fine with my code but I can't seem to figure out how to print out the individual sentence vectors, I have looked through the documentation and haven't found much help. Here is what my code looks like so far. sentences = []for uid, line in enumerate(open(filename)): sentences.append(LabeledSentence(words=line.split(), labels=['SENT_%s' % uid])) model = Doc2Vec(alpha=0.025, min_alpha=0.025) model.build_vocab(sentences)for epoch in range(10): model.train(sentences) model.alpha -= 0.002 model.min_alpha = model.alpha sent_reg = r'[SENT].*'for item in model.vocab.keys(): sent = re.search(sent_reg, item) if sent: continue else: print item ###I'm not sure how to produce the vectors from here and this doesn't work## sent_id = 0for item in model: print model["SENT_"+str(sent_id)] sent_id += 1 I'm not sure where to go from here so any help is appreciated. Thanks! *Joshua Valdez* *Computational Linguist : Cognitive Scientist * (440)-231-0479 jd...@case.edu <j...@uw.edu> | j...@uw.edu | jo...@armsandanchors.com <http://www.linkedin.com/in/valdezjoshua/> _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor