Re: [Rdkit-discuss] Fast similarity search

2017-05-19 Thread Tim Dudgeon
Greg, Nils, Andrew, Thanks for all that info. Gives me plenty to work on! Tim On 19/05/2017 09:27, Andrew Dalke wrote: > On May 19, 2017, at 08:33, Greg Landrum wrote: >> The best solution to this is to use chemfp. It's a remarkable piece of >> software. > Thanks, Greg. > >> If you aren't wil

Re: [Rdkit-discuss] Fast similarity search

2017-05-19 Thread Andrew Dalke
On May 19, 2017, at 08:33, Greg Landrum wrote: > The best solution to this is to use chemfp. It's a remarkable piece of > software. Thanks, Greg. > If you aren't willing to license that, the RDKit's search brute-force > fingerprint search capabilities aren't too bad for in-memory fingerprints.

Re: [Rdkit-discuss] Fast similarity search

2017-05-18 Thread Greg Landrum
Hi Tim, First the best answer: The best solution to this is to use chemfp. It's a remarkable piece of software. If you aren't willing to license that, the RDKit's search brute-force fingerprint search capabilities aren't too bad for in-memory fingerprints. There's some information in this slide

Re: [Rdkit-discuss] Fast similarity search

2017-05-18 Thread Nils Weskamp
Hi Tim, according to https://www.knime.org/files/01_greg_landrum.pdf, the PostgreSQL cartridge can compare ~1 million compounds/sec on a single CPU (and this talk is from 2011). ChemFP is much faster if you pre-load all your FPs into main memory. Hope this helps, Nils Am 18.05.2017 um 23:15 schr

[Rdkit-discuss] Fast similarity search

2017-05-18 Thread Tim Dudgeon
I think I recall Greg mentioning that RDKit can be used for very fast similarity search (e.g. all vs. all comparisons or searches against multi-million sized datasets). If so, is this part the of the standard distro, or something extra (chemfp?). And can it run inside the cartridge? And any benc