On 17/08/15 18:50, Joshua Valdez wrote:
Okay, so I'm trying to use Doc2Vec to simply read in a a file that is a
list of sentences like this:
This list us for folks learning the core Pyhton lanmguage and the
standard library.
Doc2Vec is not part of that library so you might find you get more
responses asking on the gensim community forums.
A quick Google search suggests there are several
to choose from
You might hit lucky here but its not an area we discuss often.
What I want to do is generate two files one with unique words from these
sentences and another file that has one corresponding vector per line (if
theres no vector output I want to output a vector od 0's)
Don't assume anyone here will know about your area of specialism.
What is a vector in this context?
I'm getting the vocab fine with my code but I can't seem to figure out how
to print out the individual sentence vectors, I have looked through the
documentation and haven't found much help. Here is what my code looks like
so far.
It seems to have gotten somewhat messed up.
I suspect you are using rich text or HTML formatting.
Try posting again in plain text.
sentences = []for uid, line in enumerate(open(filename)):
sentences.append(LabeledSentence(words=line.split(),
labels=['SENT_%s' % uid]))
model = Doc2Vec(alpha=0.025, min_alpha=0.025)
model.build_vocab(sentences)for epoch in range(10):
model.train(sentences)
model.alpha -= 0.002
model.min_alpha = model.alpha
sent_reg = r'[SENT].*'for item in model.vocab.keys():
sent = re.search(sent_reg, item)
if sent:
continue
else:
print item
###I'm not sure how to produce the vectors from here and this doesn't work##
sent_id = 0for item in model:
print model["SENT_"+str(sent_id)]
sent_id += 1
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor