期刊文献+

醇酚类化合物毒性的QSAR研究 被引量:1

QSAR study on toxicities of alcohol and phenol compounds
全文增补中
收藏 分享 导出
摘要 摘要:化合物毒性与描述符通常呈现为非线性关系。量子化学计算的化合物分子描述符中包含诸多无关特征与冗余特征.最大相关最小冗余(mRMR)是应用较广泛的特征选择方法,但当前的mRMR对连续型因变量不适用,且存在相关性测度与冗余性测度不可比的缺陷.定量构效关系(QSAR)研究中因变量(毒性)与自变量(描述符)多为连续型变量,本文以非线性的距离相关系数(dCor)取代线性的Pearson相关系数(R),在非线性条件下实现了相关性测度与冗余性测度可比,由此提出了新的特征选择方法mRMR-dCor.3个醇酚类化合物毒性QSAR数据集的分析表明.基于mRMR-dCor选择特征的支持向量回归(SVR)模型独立预测Q。分别为0.954、0.941、0.981,明显优于参比模型与文献报道,mRMR-dCor选择的多数保留分子描述符得到文献报道支持.mRMR-dCor在化合物QSAR、定量构质关系等研究中有广泛应用前景. The toxicities and features of compounds are generally presented as a non-linear relationship. The compound molecular descriptors calculated by the quantum chemistry methods contain numerous irrelevant and redundant features. Although widely used, the current version of minimal redundancy maximal relevance (mRMR) feature selection method is not applicable for continuous dependent variable and the measurement of relevance and redundancy is incomparable. For quantitative structure-activity relationship (QSAR), both dependent variables (toxicities) and independent variables (molecular descriptors) are usually continuous. Therefore, we used distance correlation (dCor) to replace Pearson correlation coefficient (R) to solve the measurement comparability between relevance and redundancy, and developed a new feature selection method named mRMR-dCor by combining mRMR with dCor in this work. Based on the in-house feature selection method and support vector regression (SVR), the independent prediction results of three phenolic and alcohol compounds datasets indicated that mRMR-dCor was superior to other reference feature selection methods in the prediction performance, with Q2 of 0.954, 0.941 and 0.981, respectively. Most of molecular descriptors selected by mRMR-dCor were also reported in previous literatures. Therefore, mRMR-dCor has broad application prospects in various domains such as QSAR and quantitative structure-pharmacokinetics relationship.
作者 邓小龙 陈渊 谭泗桥 袁哲明 DENG Xiaolong1,2, CHEN YuanI, TAN Siqiao3 , YUAN Zheming1,2 1. Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Changsha 410128 2. Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Changsha 410128 3. Colleze of Information Science and Technolozv, Hunan Agricultural University, Changsha 410128
出处 《环境科学学报》 CAS CSCD 北大核心 2016年第12期4490-4499,共10页 Acta Scientiae Circumstantiae
基金 教育部博士点基金(No.20124320110002) 长沙市科技计划项目(No.K140601g-21)
关键词 最小冗余最大相关 特征选择 定量构效关系 距离相关 支持向量回归 minimal redundancy maximal relevance feature selection quantitative structure-activity relationship distance correlation supportvector regression
作者简介 邓小龙(1989-),男,E-mail:dxl27293473@163.com E-mail:zhmyuan@sina.com 545534721@qq.com.
  • 相关文献

参考文献7

二级参考文献76

共引文献50

同被引文献14

引证文献1

投稿分析

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈