PolarDB 开源版 通过rdkit 支撑生物、化学分子结构数据存储与计算、分析

PolarDB 的云原生存算分离架构, 具备低廉的数据存储、高效扩展弹性、高速多机并行计算能力、高速数据搜索和处理; PolarDB与计算算法结合, 将实现双剑合璧, 推动业务数据的价值产出, 将数据变成生产力.

本文将介绍PolarDB 开源版 通过rdkit 支撑生物、化学分子结构数据存储与计算、分析

测试环境为macOS+docker, PolarDB部署请参考如何用 PolarDB 证明巴菲特的投资理念 - 包括PolarDB简单部署

rdkit 介绍

分子具有连接、分形、图、组合的特征, 低级生命组成高级生命, 众多高级生命组成社会, 众多低维生物通过分形组成高维生物.


  • Business-friendly BSD license

  • Core data structures and algorithms in C++

  • Python 3.x wrappers generated using Boost.Python

  • Java and C# wrappers generated with SWIG

  • 2D and 3D molecular operations

  • Descriptor generation for machine learning

  • Molecular database cartridge for PostgreSQL

  • Cheminformatics nodes for KNIME (distributed from the KNIME community site: https://www.knime.com/rdkit)

PolarDB 通过rdkit插件实现生物、化学分子结构数据存储与计算、分析. (相似搜索、子结构或精确匹配搜索、分子比较等)



 403 | btree_mol_ops            
 403 | btree_bfp_ops            
 403 | btree_sfp_ops            
 405 | hash_mol_ops             
 405 | hash_bfp_ops             
 405 | hash_sfp_ops             
 783 | gist_mol_ops             
 783 | gist_qmol_ops            
 783 | gist_bfp_ops             
 783 | gist_sfp_ops             
 783 | gist_sfp_low_ops         
 783 | gist_reaction_ops        
2742 | gin_bfp_ops              

部署rdkit on PolarDB

  1. boost依赖

wget https://boostorg.jfrog.io/artifactory/main/release/1.69.0/source/boost_1_69_0.tar.bz2    
tar -jxvf boost_1_69_0.tar.bz2    
cd boost_1_69_0    
./bootstrap.sh --with-libraries=serialization   
sudo ./b2 --prefix=/usr/local/boost -a install    
  1. cairo依赖

sudo yum install -y cairo-devel cairo  
  1. reetype依赖

wget https://download.savannah.gnu.org/releases/freetype/freetype-2.12.1.tar.gz  
tar -zxvf freetype-2.12.1.tar.gz  
cd freetype-2.12.1  
./configure --prefix=/usr/local/freettype  
make -j 6  
sudo make install  
sudo vi /etc/ld.so.conf  
# add  
sudo ldconfig  
  1. rdkit

wget https://github.com/rdkit/rdkit/archive/refs/tags/Release_2022_09_3.tar.gz  
tar -zxvf Release_2022_09_3.tar.gz   

4.1 Comic_Neue 依赖

## in macOS  
cp Comic_Neue.zip /home/postgres/rdkit-Release_2022_09_3/Code/GraphMol/MolDraw2D  
## in docker  
sudo chown postgres:postgres /home/postgres/rdkit-Release_2022_09_3/Code/GraphMol/MolDraw2D/Comic_Neue.zip  

4.2 rdkit

cd rdkit-Release_2022_09_3  
mkdir build  
cd build  
cmake -DBOOST_ROOT=/usr/local/boost -DBoost_INCLUDE_DIR=/usr/local/boost/include -DRDK_BUILD_PYTHON_WRAPPERS=OFF -DRDK_BUILD_PGSQL=ON -DPostgreSQL_ROOT="/home/postgres/tmp_basedir_polardb_pg_1100_bld" -DFREETYPE_LIBRARY=/usr/local/freettype/lib/libfreetype.so.6 -DFREETYPE_INCLUDE_DIRS=/usr/local/freettype/include/freetype2 -DRDK_TEST_MULTITHREADED=OFF -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=/usr/local/rdkit -Wno-dev ..  
// OR  
// cmake -DBOOST_ROOT=/usr/local/boost -DBoost_INCLUDE_DIR=/usr/local/boost/include -DRDK_BUILD_PYTHON_WRAPPERS=OFF -DRDK_BUILD_PGSQL=ON -DPostgreSQL_ROOT="/home/postgres/tmp_basedir_polardb_pg_1100_bld" -DFREETYPE_LIBRARY=/usr/local/freettype/lib/libfreetype.so.6 -DFREETYPE_INCLUDE_DIRS=/usr/local/freettype/include/freetype2 -DRDK_TEST_MULTITHREADED=OFF -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=/usr/local/rdkit -DRDK_BUILD_MOLINTERCHANGE_SUPPORT=OFF -Wno-dev ..  
// 编译时需要联网, cmake的时候需要git clone代码, 期间会下载几个依赖的软件, 如果没有下载成功就多试几次  
// ...  
make -j 6  
// 编译时需要联网, make的时候也需要git clone代码  
sudo make install  
  1. 安装rdkit插件到polardb.

postgres=# create extension rdkit ;  

rdkit 编译选项:

option(RDK_BUILD_SWIG_WRAPPERS "build the SWIG wrappers" OFF )  
option(RDK_BUILD_PYTHON_WRAPPERS "build the standard python wrappers" ON )  
option(RDK_BUILD_COMPRESSED_SUPPLIERS "build in support for compressed MolSuppliers" OFF )  
option(RDK_BUILD_INCHI_SUPPORT "build the rdkit inchi wrapper" OFF )  
option(RDK_BUILD_AVALON_SUPPORT "install support for the avalon toolkit. Use the variable AVALONTOOLS_DIR to set the location of the source." OFF )  
option(RDK_BUILD_PGSQL "build the PostgreSQL cartridge" OFF )  
option(RDK_BUILD_RPATH_SUPPORT "build shared libraries using rpath" OFF)  
option(RDK_PGSQL_STATIC "statically link rdkit libraries into the PostgreSQL cartridge" ON )  
option(RDK_BUILD_CONTRIB "build the Contrib directory" OFF )  
option(RDK_INSTALL_INTREE "install the rdkit in the source tree (former behavior)" ON )  
option(RDK_INSTALL_DLLS_MSVC "install the rdkit DLLs when using MSVC" OFF)  
option(RDK_INSTALL_STATIC_LIBS "install the rdkit static libraries" ON )  
option(RDK_INSTALL_PYTHON_TESTS "install the rdkit Python tests with the wrappers" OFF )  
option(RDK_BUILD_THREADSAFE_SSS "enable thread-safe substructure searching" ON )  
option(RDK_BUILD_SLN_SUPPORT "include support for the SLN format" ON )  
option(RDK_TEST_MULTITHREADED "run some tests of multithreading" ON )  
option(RDK_BUILD_SWIG_JAVA_WRAPPER "build the SWIG JAVA wrappers (does nothing if RDK_BUILD_SWIG_WRAPPERS is not set)" ON )  
option(RDK_BUILD_SWIG_CSHARP_WRAPPER "build the experimental SWIG C# wrappers (does nothing if RDK_BUILD_SWIG_WRAPPERS is not set)" OFF )  
option(RDK_SWIG_STATIC "statically link rdkit libraries into the SWIG wrappers" ON )  
option(RDK_TEST_MMFF_COMPLIANCE "run MMFF compliance tests (requires tar/gzip)" ON )  
option(RDK_BUILD_CPP_TESTS "build the c++ tests (disabing can speed up builds" ON)  
option(RDK_USE_FLEXBISON "use flex/bison, if available, to build the SMILES/SMARTS/SLN parsers" OFF)  
option(RDK_TEST_COVERAGE "Use G(L)COV to compute test coverage" OFF)  
option(RDK_USE_BOOST_SERIALIZATION "Use the boost serialization library if available" ON)  
option(RDK_USE_BOOST_STACKTRACE "use boost::stacktrace to do more verbose invariant output (linux only)" ON)  
option(RDK_BUILD_TEST_GZIP "Build the gzip'd stream test" OFF)  
option(RDK_OPTIMIZE_POPCNT "Use SSE4.2 popcount instruction while compiling." ON)  
option(RDK_USE_STRICT_ROTOR_DEFINITION "Use the most strict rotatable bond definition" ON)  
option(RDK_BUILD_DESCRIPTORS3D "Build the 3D descriptors calculators, requires Eigen3 to be installed" ON)  
option(RDK_BUILD_FREESASA_SUPPORT "build the rdkit freesasa wrapper" OFF )  
option(RDK_BUILD_COORDGEN_SUPPORT "build the rdkit coordgen wrapper" ON )  
option(RDK_BUILD_MAEPARSER_SUPPORT "build the rdkit MAE parser wrapper" ON )  
option(RDK_BUILD_MOLINTERCHANGE_SUPPORT "build in support for CommonChem molecule interchange" ON )  
option(RDK_BUILD_YAEHMOP_SUPPORT "build support for the YAeHMOP wrapper" OFF)  
option(RDK_BUILD_XYZ2MOL_SUPPORT "build in support for the RDKit's implementation of xyz2mol (in the DetermineBonds library)" OFF )  
option(RDK_BUILD_STRUCTCHECKER_SUPPORT "build in support for the StructChecker alpha (not recommended, use the MolVS integration instead)" OFF )  
option(RDK_USE_URF "Build support for Florian Flachsenberg's URF library" ON)  
option(RDK_INSTALL_DEV_COMPONENT "install libraries and headers" ON)  
option(RDK_USE_BOOST_REGEX "use boost::regex instead of std::regex (needed for systems with g++-4.8)" OFF)  
option(RDK_USE_BOOST_IOSTREAMS "use boost::iostreams" ON)  
option(RDK_BUILD_MINIMAL_LIB "build the minimal RDKit wrapper (for the JS bindings)" OFF)  
option(RDK_BUILD_CFFI_LIB "build the CFFI wrapper (for use in other programming languges)" OFF)  
option(RDK_BUILD_FUZZ_TARGETS "build the fuzz targets" OFF)  

make installcheck

cd rdkit-Release_2022_09_3/Code/PgSQL/rdkit

[postgres@aa25c5be9681 rdkit]$ USE_PGXS=1 make installcheck
/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin'      --dbname=contrib_regression rdkit-91 props btree molgist bfpgist-91 bfpgin sfpgist slfpgist fps reaction  
(using postmaster on, default port)
============== dropping database "contrib_regression" ==============
============== creating database "contrib_regression" ==============
============== running regression test queries        ==============
test rdkit-91                     ... ok
test props                        ... ok
test btree                        ... ok
test molgist                      ... ok
test bfpgist-91                   ... ok
test bfpgin                       ... ok
test sfpgist                      ... ok
test slfpgist                     ... ok
test fps                          ... ok
test reaction                     ... ok

 All 10 tests passed. 

 All 10 tests, 0 tests in ignore, 0 tests in polar ignore. 


