Shuffling cross-validation-bee algorithm as a new descriptor selection method for retention studies of pesticides in biopartitioning micellar chromatography

J Environ Sci Health B. 2017 May 4;52(5):346-352. doi: 10.1080/03601234.2017.1283139. Epub 2017 Feb 22.

Abstract

Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.

Keywords: Pesticides; bee algorithm; quantitative structure property relationship (QSPR); shuffling cross–validation; variable selection method.

MeSH terms

  • Algorithms*
  • Chromatography / methods*
  • Hydrogen Bonding
  • Hydrophobic and Hydrophilic Interactions
  • Linear Models
  • Micelles
  • Models, Chemical
  • Pesticides / chemistry*
  • Quantitative Structure-Activity Relationship
  • Reproducibility of Results
  • Support Vector Machine

Substances

  • Micelles
  • Pesticides