TY - GEN
T1 - A population-based incremental learning approach to microarray gene expression feature selection
AU - Perez, Meir
AU - Rubiny, David M.
AU - Marwala, Tshilidzi
AU - Scottz, Lesley E.
AU - Stevenszx, Wendy
PY - 2010
Y1 - 2010
N2 - The identification of a differentially expressed set of genes in microarray data analysis is essential, both for novel oncogenic pathway identification, as well as for automated diagnostic purposes. This paper assesses the effectiveness of the Population-Based Incremental Learning (PBIL) algorithm in identifying a class differentiating gene set for sample classification. PBIL is based on iteratively evolving the genome of a search population by updating a probability vector, guided by the extent of class-separability demonstrated by a combination of features. PBIL is compared, both to standard Genetic Algorithm (GA), as well as to an Analysis of Variance (ANOVA). The algorithms are tested on a publically available three-class leukaemia microarray data set (n=72). After running 30 repeats of both GA and PBIL, PBIL was able to find an average feature-space separability of 97.04%, while GA achieved an average class-separability of 96.39%. PBIL also found smaller feature-spaces than GA, (PBIL - 326 genes and GA - 2652) thus excluding a large percentage of redundant features. It also, on average, outperformed the ANOVA approach for n = 2652 (91.62%), q < 0:05 (94.44%), q < 0:01 (93.06%) and q < 0:005 (95.83%). The best PBIL run (98.61%) even outperformed ANOVA for n = 326 and q < 0:001 (both 97.22%). PBIL's performance is ascribed to its ability to direct the search, not only towards the optimal solution, but also away from the worst.
AB - The identification of a differentially expressed set of genes in microarray data analysis is essential, both for novel oncogenic pathway identification, as well as for automated diagnostic purposes. This paper assesses the effectiveness of the Population-Based Incremental Learning (PBIL) algorithm in identifying a class differentiating gene set for sample classification. PBIL is based on iteratively evolving the genome of a search population by updating a probability vector, guided by the extent of class-separability demonstrated by a combination of features. PBIL is compared, both to standard Genetic Algorithm (GA), as well as to an Analysis of Variance (ANOVA). The algorithms are tested on a publically available three-class leukaemia microarray data set (n=72). After running 30 repeats of both GA and PBIL, PBIL was able to find an average feature-space separability of 97.04%, while GA achieved an average class-separability of 96.39%. PBIL also found smaller feature-spaces than GA, (PBIL - 326 genes and GA - 2652) thus excluding a large percentage of redundant features. It also, on average, outperformed the ANOVA approach for n = 2652 (91.62%), q < 0:05 (94.44%), q < 0:01 (93.06%) and q < 0:005 (95.83%). The best PBIL run (98.61%) even outperformed ANOVA for n = 326 and q < 0:001 (both 97.22%). PBIL's performance is ascribed to its ability to direct the search, not only towards the optimal solution, but also away from the worst.
UR - http://www.scopus.com/inward/record.url?scp=78651240894&partnerID=8YFLogxK
U2 - 10.1109/EEEI.2010.5661897
DO - 10.1109/EEEI.2010.5661897
M3 - Conference contribution
AN - SCOPUS:78651240894
SN - 9781424486809
T3 - 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2010
SP - 10
EP - 14
BT - 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2010
T2 - 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2010
Y2 - 17 November 2010 through 20 November 2010
ER -