Re: [Scikit-learn-general] TF-IDF and LSI

2013-09-26 Thread Lars Buitinck
2013/9/26 Olivier Grisel : > 2013/9/7 Tasos Ventouris : >> I tried to run my script and then create a string from the list for each >> text and inlcude those texts into the TfidfVectorizer. I am satisfied from >> the results, but unfortunately, if I have 1000 or more documents, this isn't >> the mo

Re: [Scikit-learn-general] TF-IDF and LSI

2013-09-26 Thread Olivier Grisel
BTW, if you want to do LSI on a large corpus, you should rather use Gensim that supports tuned datastructures and out-of-core processing for this specific application domain: http://radimrehurek.com/gensim/ -- Olivier -

Re: [Scikit-learn-general] TF-IDF and LSI

2013-09-26 Thread Olivier Grisel
2013/9/7 Tasos Ventouris : > Hello, I have to questions where I would like your feedback. > > The first one: > > Here is my code: > > from sklearn.feature_extraction.text import TfidfVectorizer > > documents = [doc1,doc2,doc3] > tfidf = TfidfVectorizer().fit_transform(documents) > pairwise_similari

[Scikit-learn-general] TF-IDF and LSI

2013-09-26 Thread Tasos Ventouris
Hello, I have to questions where I would like your feedback. The first one: Here is my code: from sklearn.feature_extraction.text import TfidfVectorizer documents = [doc1,doc2,doc3]tfidf = TfidfVectorizer().fit_transform(documents)pairwise_similarity = tfidf * tfidf.Tprint pairwise_similarity.A W