Re: [scikit-learn] Recurrent questions about speed for TfidfVectorizer

2018-11-26 Thread Roman Yurchak via scikit-learn
Tries are interesting, but it appears that while they use less memory that dicts/maps they are generally slower than dicts for a large number of elements. See e.g. https://github.com/pytries/marisa-trie/blob/master/docs/benchmarks.rst. This is also consistent with the results in the below linke

[scikit-learn] Contrib: Artificial Immune Recongnition System

2018-11-26 Thread AZZOUG Aghiles
Hello devs, I'm a final year computer engineering student, currently doing my masters and engineering degree in recommender systems. Last summer, after an optimization course, I found a quite interesting recognition algorithm called : Artificial immune recognition system (described in the paper b

Re: [scikit-learn] Recurrent questions about speed for TfidfVectorizer

2018-11-26 Thread Andreas Mueller
I think tries might be an interesting datastructure, but it really depends on where the bottleneck is. I'm really surprised they are not used more, but maybe that's just because implementations are missing? On 11/26/18 8:39 AM, Roman Yurchak via scikit-learn wrote: Hi Matthieu, if you are int

Re: [scikit-learn] Recurrent questions about speed for TfidfVectorizer

2018-11-26 Thread Roman Yurchak via scikit-learn
Hi Matthieu, if you are interested in general questions regarding improving scikit-learn performance, you might be want to have a look at the draft roadmap https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 -- there is a lot topics where suggestions / PRs on improving performa