Re: [Scikit-learn-general] Making approximate nearest neighbor search more efficient

2015-08-06 Thread Joel Nothman
It's nice to see some decent speed-up factors, though the accuracy tradeoff is still not so great. Still, I'd like to see the code and where we can go from here. Great work so far! On 7 August 2015 at 07:50, Maheshakya Wijewardena wrote: > I did a rough implementation, setting b = min_hash_match

Re: [Scikit-learn-general] Making approximate nearest neighbor search more efficient

2015-08-06 Thread Maheshakya Wijewardena
I did a rough implementation, setting b = min_hash_match. The result I got from running the benchmark is attached. It was able to roughly triple the speed-up of kneighbors function for large index sizes. However, this implementation adds some overhead to fitting time as there are 2**b * n_estimator

Re: [Scikit-learn-general] Variable Importance Definition

2015-08-06 Thread Efrem Braun
Sebastian, That does indeed help. I now understand that the calculated importance is indeed the average Gini importance. Thank you very much! Efrem Braun Hi, Efrem, I agree, this can maybe cause confusion. However, to me, 1) > expected fraction of samples they contribute to, (though it is not

Re: [Scikit-learn-general] contributing

2015-08-06 Thread Andreas Mueller
Hi Jaret. Please stay on the mailing list so that everybody can answer. For 5091, as I said, best discuss it there, and if you have a question, just post it there. People that follow github will see it. For finding bugs: If you see an issue and it looks fixed, definitely ask "is this fixed in

Re: [Scikit-learn-general] contributing

2015-08-06 Thread Andreas Mueller
Hey Jaret. It is usually easier to discuss these things on the github issue tracker. Which is your pull request? Just ask there. For the doctests you can do "make test-doc" that will run nosetests with the appropriate options. For the whitespace, there is an option to ignore whitespace changes.