Re: [Rdkit-discuss] Using RdKit in Parallel

2019-02-20 Thread Michal Krompiec
Dear Stamatia, If the molecules are processed completely independently by your code, it may be simpler to split the SDF into chunks (e.g. with csplit in a bash script) and then run separate instances of your python code on each chunk, wait until all are finished and finally collate the output. Thus

Re: [Rdkit-discuss] Using RdKit in Parallel

2019-02-20 Thread Christos Kannas
Hi Stamatia, Yes, SDMolSupplier is not thread safe. My guess is due to the nature of SDF file where a molecule record needs multiple lines and you do not know a-priory the number of lines per molecule in order to split the file to different threads/processes. Given that your proposed approach is

[Rdkit-discuss] Using RdKit in Parallel

2019-02-20 Thread Stamatia Zavitsanou
Hello everyone, We have been writing a script that searches though a large number of molecules within different files for a common substructure. To speed this up we have been attempting to run this script in parallel-see scripts below. However online the tutorial notes make reference to proble