2013/9/26 Olivier Grisel :
> 2013/9/7 Tasos Ventouris :
>> I tried to run my script and then create a string from the list for each
>> text and inlcude those texts into the TfidfVectorizer. I am satisfied from
>> the results, but unfortunately, if I have 1000 or more documents, this isn't
>> the mo
BTW, if you want to do LSI on a large corpus, you should rather use
Gensim that supports tuned datastructures and out-of-core processing
for this specific application domain:
http://radimrehurek.com/gensim/
--
Olivier
-
2013/9/7 Tasos Ventouris :
> Hello, I have to questions where I would like your feedback.
>
> The first one:
>
> Here is my code:
>
> from sklearn.feature_extraction.text import TfidfVectorizer
>
> documents = [doc1,doc2,doc3]
> tfidf = TfidfVectorizer().fit_transform(documents)
> pairwise_similari
Hello, I have to questions where I would like your feedback.
The first one:
Here is my code:
from sklearn.feature_extraction.text import TfidfVectorizer
documents = [doc1,doc2,doc3]tfidf =
TfidfVectorizer().fit_transform(documents)pairwise_similarity = tfidf *
tfidf.Tprint pairwise_similarity.A
W