Thanks for the reference. That sort of bounds screening would probably work
well in the C++ layer for the bulk similarity functions. My initial
experiments without bounds screening found that doing individual similarity
calculations in Python was a lot slower than the bulk function because
moving
I would be very surprised if speed of fingerprint similarity was the
limiting factor on a distance- matrix-based clustering method. Normally
they are constrained by memory requirements. In this case I am using the
MaxMin picker in RDKit to generate the cluster “centroids” and am wanting
to fill
I wonder if there is a way to make use of PyTorch or tensorflow to do this
on a GPU. That’s where some big speed ups might be found.
Also, consider using these bounds. They do make a big difference in many
cases.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527184/
On Tue, Oct 25, 2022 at
On 24/10/2022 19:47, David Cosgrove wrote:
For the record, I have attempted this, but got only a marginal
speed-up (130% of CPU used, with any number of threads above 2). The
procedure I used was to extract the fingerprint pointers into a
std::vector, create a std::vector for the results,
For the record, I have attempted this, but got only a marginal speed-up
(130% of CPU used, with any number of threads above 2). The procedure I
used was to extract the fingerprint pointers into a std::vector, create a
std::vector for the results, unlock the GIL to do the bulk tanimoto
Hi Greg,
Thanks for the pointer. I’ll take a look. If it could go in the next patch
release that would be really useful.
Dave
On Sat, 22 Oct 2022 at 10:52, Greg Landrum wrote:
>
> Hi Dave,
>
> We have multiple examples of this in the code, here’s one:
>
>
Hi Dave,
We have multiple examples of this in the code, here’s one:
https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
I’m not sure how this would interact with the call to Python::extract
that’s in the bulk
Hi,
I'm doing a lot of tanimoto similarity calculations on large datasets using
BulkTanimotoSimilarity. It is an obvious candidate for parallelisation, so
I am using concurrent.futures to do so. If I use ProcessPoolExectuor, I
get good speed-up but each process needs a copy of the fingerprint
8 matches
Mail list logo