Re: [Scikit-learn-general] Which scikit-learn contributors share common interests?

2013-09-29 Thread Lars Buitinck
I wrote a draft coclustering script: https://gist.github.com/larsmans/6753565 Instead of going through the present files, it parses the complete history to also find the developers of files that have already disappeared. -

Re: [Scikit-learn-general] Possible ways of contributing.

2013-09-29 Thread Jacob Vanderplas
Hi, I'd start by going over the contributing tips in the documentation: http://scikit-learn.org/stable/developers/ There are several suggestions in there of where you might get started. In particular, if you have a good understanding of machine learning methods and concepts, improving the document

Re: [Scikit-learn-general] LSA for documents similarity

2013-09-29 Thread Tasos Ventouris
I read in a tutorial that I can download many wikipedia articles and use them on LSA. If I use 500k random articles from wikipedia plus my 2 documents that I want to find the similarity score, will I have results like TF-IDF? Or do I have to use related documents with my 2 docs? > From: larsm..

Re: [Scikit-learn-general] LSA for documents similarity

2013-09-29 Thread Lars Buitinck
2013/9/29 Tasos Ventouris : > Thank you for your answer. I checked it with many documents. Both totaly > different and similar documents. You can see an example of the text I used > here https://dl.dropboxusercontent.com/u/37124455/documents.txt > > Another script I wrote with only tf-idf shows me

Re: [Scikit-learn-general] LSA for documents similarity

2013-09-29 Thread Tasos Ventouris
Thank you for your answer. I checked it with many documents. Both totaly different and similar documents. You can see an example of the text I used here https://dl.dropboxusercontent.com/u/37124455/documents.txt Another script I wrote with only tf-idf shows me 69% similarity on those documents.

Re: [Scikit-learn-general] LSA for documents similarity

2013-09-29 Thread Lars Buitinck
2013/9/29 Tasos Ventouris : > I am trying to create a script to compute the similarity for only two > documents. I wrote this code but if I use two docs on the data set, the > results is a 2x2 matrix with [[1,0],[0,1]]. If I use more than 2 documents, > the results are almost correct. Any suggestio

[Scikit-learn-general] LSA for documents similarity

2013-09-29 Thread Tasos Ventouris
I am trying to create a script to compute the similarity for only two documents. I wrote this code but if I use two docs on the data set, the results is a 2x2 matrix with [[1,0],[0,1]]. If I use more than 2 documents, the results are almost correct. Any suggestion? def lsa(doc1,doc2):datas