Re: [Rdkit-discuss] Mongo-RDKit Integration!

2020-07-09 Thread Christopher Zou
Hi Professor Steinbeck and Dr. Sorokina, Great to hear from you! I've seen the blog posts by Suvee, Swain, and ChemBL. If there are any that I haven't gotten to, I'd love to get a look at them. By all means, let's set up a call—I'd be glad to learn more about COCONUT and I think describing possib

Re: [Rdkit-discuss] Mongo-RDKit Integration!

2020-07-09 Thread Patrick Fuller
Hi Chris, I would treat tautomers as duplicates for my use case, but this would *not* be expected behavior for the majority of RDKit users. I think it'll be impossible to write something that works for everyone, so then the question is what is in scope and how to handle errors gracefully. I think

Re: [Rdkit-discuss] Mongo-RDKit Integration!

2020-07-09 Thread Christopher Zou
Hi Patrick, Thanks for the data and the feedback! I hadn't thought about logging malformed structures, which seems like something good to build into the data registration process. My mentors (Greg Landrum, Peter Gedeck, and Marco Stenta) and I also discussed possible approaches to pre-processing

Re: [Rdkit-discuss] Mongo-RDKit Integration!

2020-07-09 Thread Christoph Steinbeck
Hi Chris, we use MongoDB in our COCONUT database of open natural products (https://coconut.naturalproducts.net). For obvious reasons, we use the CDK under the hood, but we are not religious about this and use Rdkit in a lot of our research. We store the chemical structures of natural products

Re: [Rdkit-discuss] Mongo-RDKit Integration!

2020-07-08 Thread Patrick Fuller
Chris, That sounds like a great idea! Optimized similarity and substructure searches are hard to get right, and most libraries leave it as an exercise to the reader to choose the right fingerprinting and db structure. I think the hardest part will be figuring out a robust end-user experience. You'

[Rdkit-discuss] Mongo-RDKit Integration!

2020-07-08 Thread Christopher Zou
Dear RDKit Community, Hope you're all well! I'm a student from UC Berkeley building an integration between RDKit and MongoDB as part of Google Summer of Code. The idea of the project is twofold: 1. Provide tools for building a chemically-intelligent MongoDB database. 2. Provide high-perfor