Aleksejs Kontijevskis J. Chem. Inf. Model., 2017, 57, 4, 680-699 https://doi.org/10.1021/acs.jcim.7b00006
Abstract
The emergence of DNA-encoded chemical libraries (DEL) field in past decade has attracted attention of pharmaceutical industry as a powerful mechanism for the discovery of novel drug-like hits for various biological targets. Nuevolution Chemetics technology enables DNA encoded synthesis of billions of chemically diverse drug-like small molecule compounds, and the efficient screening and optimization of these, facilitating effective identification of drug candidates at an unprecedented speed and scale. Although many approaches have been developed by the cheminformatics community for the analysis and visualization of drug-like chemical space, most of them are restricted to the analysis of maximum few millions of compounds and cannot handle collections of 10^8-10^12 compounds typical for DELs. To address this big chemical data challenge, we developed Reduced Complexity Molecular (RCM) frameworks methodology as an abstract and very general way of representing chemical structures. By further introducing RCM framework descriptors we constructed a global framework map of drug-like chemical space and demonstrate how chemical space occupied by multi-million-member drug-like Chemetics DNA-encoded libraries and virtual combinatorial libraries with >10^12 members could be analysed and mapped without a need for library enumeration. We further validate the approach by performing RCM framework-based searches in drug-like chemical universe and mapping Chemetics library selection outputs for LSD1 target on a global framework chemical space map.