TY - GEN
T1 - The Fuzzy Gene Filter
T2 - 2nd International Conference on Computational Bioscience, CompBio 2011
AU - Perez, Meir
AU - Marwala, Tshilidzi
PY - 2011
Y1 - 2011
N2 - The Fuzzy Gene Filter (FGF) is an optimised Fuzzy Inference System designed to rank genes in order of differential expression, based on expression data generated in a microarray experiment. This paper examines the effectiveness of the FGF for feature selection using various classification architectures. The FGF is compared to three of the most common gene ranking algorithms: t-test, Wilcoxon test and ROC curve analysis. Four classification schemes are used to compare the performance of the FGF vis-à-vis the standard approaches: K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naïve Bayesian Classifier (NBC) and Artificial Neural Network (ANN). A nested stratified Leave-One-Out Cross Validation scheme is used to identify the optimal number top ranking genes, as well as the optimal classifier parameters. Two microarray data sets are used for the comparison: a prostate cancer data set and a lymphoma data set. Genes ranked by the FGF attained significantly higher accuracies for all of the classifiers tested, on both data sets (p = 0.0231 for the prostate data set and p = 0.1888 for the lymphoma data set). When using the prostate data set, the FGF performed best on the KNN classifier, achieving an accuracy of 96.1% with the top 9 ranking genes. When using the lymphoma data set, the FGF performed best on the SVM classifier, achieving an accuracy of 100% with the top 12 ranking genes. The performance of the FGF is attributed to the fact that it is optimised to rank genes in such a way that results in maximum class separability, as well as its incorporation of multiple features of the data when ranking genes.
AB - The Fuzzy Gene Filter (FGF) is an optimised Fuzzy Inference System designed to rank genes in order of differential expression, based on expression data generated in a microarray experiment. This paper examines the effectiveness of the FGF for feature selection using various classification architectures. The FGF is compared to three of the most common gene ranking algorithms: t-test, Wilcoxon test and ROC curve analysis. Four classification schemes are used to compare the performance of the FGF vis-à-vis the standard approaches: K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naïve Bayesian Classifier (NBC) and Artificial Neural Network (ANN). A nested stratified Leave-One-Out Cross Validation scheme is used to identify the optimal number top ranking genes, as well as the optimal classifier parameters. Two microarray data sets are used for the comparison: a prostate cancer data set and a lymphoma data set. Genes ranked by the FGF attained significantly higher accuracies for all of the classifiers tested, on both data sets (p = 0.0231 for the prostate data set and p = 0.1888 for the lymphoma data set). When using the prostate data set, the FGF performed best on the KNN classifier, achieving an accuracy of 96.1% with the top 9 ranking genes. When using the lymphoma data set, the FGF performed best on the SVM classifier, achieving an accuracy of 100% with the top 12 ranking genes. The performance of the FGF is attributed to the fact that it is optimised to rank genes in such a way that results in maximum class separability, as well as its incorporation of multiple features of the data when ranking genes.
KW - Classifier
KW - Feature selection
KW - Fuzzy Gene Filter
KW - Microarray
UR - http://www.scopus.com/inward/record.url?scp=82655165395&partnerID=8YFLogxK
U2 - 10.2316/P.2011.742-015
DO - 10.2316/P.2011.742-015
M3 - Conference contribution
AN - SCOPUS:82655165395
SN - 9780889868892
T3 - Proceedings of the 2nd IASTED International Conference on Computational Bioscience, CompBio 2011
SP - 406
EP - 413
BT - Proceedings of the 2nd IASTED International Conference on Computational Bioscience, CompBio 2011
Y2 - 11 July 2011 through 13 July 2011
ER -