Hi All,

I was looking at a blog post from greg:

https://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html

about fingerprint screenout. The part that got me confused was the timings in 
his blog post because run times in my case where a lot slower.

Gregs numbers:


[07:21:19] INFO: mols from smiles
[07:21:27] INFO: Results1:  7.77 seconds, 50000 mols
[07:21:27] INFO: queries from smiles
[07:21:27] INFO: Results2:  0.16 seconds
[07:21:27] INFO: generating pattern fingerprints for mols
[07:21:43] INFO: Results3:  16.11 seconds
[07:21:43] INFO: generating pattern fingerprints for queries
[07:21:43] INFO: Results4:  0.34 seconds
[07:21:43] INFO: testing frags queries
[07:22:03] INFO: Results5:  19.90 seconds. 6753 tested (0.0003 of total), 3989 
found,  0.59 accuracy. 0 errors.
[07:22:03] INFO: testing leads queries
[07:22:23] INFO: Results6:  19.77 seconds. 1586 tested (0.0001 of total), 1067 
found,  0.67 accuracy. 0 errors.
[07:22:23] INFO: testing pieces queries
[07:23:19] INFO: Results7:  55.37 seconds. 3333202 tested (0.0810 of total), 
1925628 found,  0.58 accuracy. 0 errors.

| 2019.09.1dev1 | 7.8 | 0.2 | 16.1 | 0.3 | 19.9 | 19.8 | 55.4 |



Machine 1:
Virtual machine, Windows Server 2012 R2 with an intel xeon (4 virtual cores)

Since the test is single-threaded it makes a bit of sense that it isn't fast 
here but it's not just a bit slower, but a lot slower, depending on test almost 
3xtimes slower

[09:03:19] INFO: mols from smiles
[09:03:38] INFO: Results1:  19.44 seconds, 50000 mols
[09:03:38] INFO: queries from smiles
[09:03:38] INFO: Results2:  0.36 seconds
[09:03:38] INFO: generating pattern fingerprints for mols
[09:04:54] INFO: Results3:  75.99 seconds
[09:04:54] INFO: generating pattern fingerprints for queries
[09:04:56] INFO: Results4:  1.55 seconds
[09:04:56] INFO: testing frags queries
[09:05:34] INFO: Results5:  37.59 seconds. 6753 tested (0.0003 of total), 3989 f
ound,  0.59 accuracy. 0 errors.
[09:05:34] INFO: testing leads queries
[09:06:11] INFO: Results6:  37.34 seconds. 1586 tested (0.0001 of total), 1067 f
ound,  0.67 accuracy. 0 errors.
[09:06:11] INFO: testing pieces queries
[09:08:39] INFO: Results7:  147.79 seconds. 3333202 tested (0.0810 of total), 19
25628 found,  0.58 accuracy. 0 errors.
| 2019.03.3 | 19.4 | 0.4 | 76.0 | 1.5 | 37.6 | 37.3 | 147.8 |

I thought maybe another issue with windows being slow so I tested on a linux VM 
on my laptop

Machine 2:
Virtual machine, Lubuntu 16.04 on a laptop i7-8850H 6-core

[09:23:31] INFO: mols from smiles
[09:23:54] INFO: Results1:  23.71 seconds, 50000 mols
[09:23:54] INFO: queries from smiles
[09:23:55] INFO: Results2:  0.48 seconds
[09:23:55] INFO: generating pattern fingerprints for mols
[09:24:53] INFO: Results3:  58.31 seconds
[09:24:53] INFO: generating pattern fingerprints for queries
[09:24:54] INFO: Results4:  1.19 seconds
[09:24:54] INFO: testing frags queries
[09:25:41] INFO: Results5:  46.22 seconds. 6753 tested (0.0003 of total), 3989 
found,  0.59 accuracy. 0 errors.
[09:25:41] INFO: testing leads queries
[09:26:26] INFO: Results6:  45.84 seconds. 1586 tested (0.0001 of total), 1067 
found,  0.67 accuracy. 0 errors.
[09:26:26] INFO: testing pieces queries
[09:28:33] INFO: Results7:  126.78 seconds. 3333202 tested (0.0810 of total), 
1925628 found,  0.58 accuracy. 0 errors.
| 2019.03.3 | 23.7 | 0.5 | 58.3 | 1.2 | 46.2 | 45.8 | 126.8 |

Pretty weird sometimes even slower sometimes faster than the windows VM but 
still a lot slower than Gregs numbers (I repeated with rdkit 2019.09.2 and got 
comparable results)

So I also tested on above laptop directly:

Machine 3:
physical install, windows 10 on a laptop i7-8850H 6-core (same machine as 2)

[09:51:43] INFO: mols from smiles
[09:51:54] INFO: Results1:  10.59 seconds, 50000 mols
[09:51:54] INFO: queries from smiles
[09:51:54] INFO: Results2:  0.20 seconds
[09:51:54] INFO: generating pattern fingerprints for mols
[09:52:24] INFO: Results3:  29.50 seconds
[09:52:24] INFO: generating pattern fingerprints for queries
[09:52:24] INFO: Results4:  0.61 seconds
[09:52:24] INFO: testing frags queries
[09:52:44] INFO: Results5:  19.71 seconds. 6753 tested (0.0003 of total), 3989 
found,  0.59 accuracy. 0 errors.
[09:52:44] INFO: testing leads queries
[09:53:04] INFO: Results6:  19.48 seconds. 1586 tested (0.0001 of total), 1067 
found,  0.67 accuracy. 0 errors.
[09:53:04] INFO: testing pieces queries
[09:54:05] INFO: Results7:  61.94 seconds. 3333202 tested (0.0810 of total), 
1925628 found,  0.58 accuracy. 0 errors.
| 2019.09.1 | 10.6 | 0.2 | 29.5 | 0.6 | 19.7 | 19.5 | 61.9 |

This is much closer to Gregs results, except for the fingerprinting which takes 
almost double the time.  Also notice how the fingerprinting on the linux VM is 
much faster also compared to other results than on the windows VM?

Conclusions:

  1.  Form what I see, it seems that the pattern fingerprinter runs a lot 
slower on windows. Is this known issue?
  2.  In virtual machines the rdkits performance simply tanks, is much worse. A 
certain penalty is to be expected but not this much. Or what am I missing? 
Machine 1 runs on central infrastructure so I would assume virtualization is 
configured correctly. For the local VM, vt-x is enabled. Yet it is much slower 
compared to the physical machine (plus that AFAIK rdkit runs faster in linux vs 
windows)

Especially the virtual machine aspect is kind of troubling because I would 
assume many real-world applications are deployed as VM and hence might suffer 
from this too?
I don't have a well defined question but more interested in other users 
experience especially regarding the virtualization.

Best Regards,

Thomas
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to