Weighted L1-norm Logistic Regression for Gene Selection of Microarray Gene Expression Classification

Aiedh Mrisi Alharthi (1), Muhammad Hisyam Lee (2), Zakariya Yahya Algamal (3)
(1) Department of Mathematical Sciences, Universiti Teknologi Malaysia
(2) Department of Mathematical Sciences, Universiti Teknologi Malaysia
(3) Department of Statistics and Informatics, University of Mosul, Mosul, Iraq
Fulltext View | Download
How to cite (IJASEIT) :
Alharthi, Aiedh Mrisi, et al. “Weighted L1-Norm Logistic Regression for Gene Selection of Microarray Gene Expression Classification”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 4, Aug. 2020, pp. 1483-8, doi:10.18517/ijaseit.10.4.10907.
The classification of cancer is a significant application of the DNA microarray data. Gene selection methods are ordinarily used handle the issue of high-dimensionality of microarray data to enable experts to diagnose and classify cancer with high accuracy. The penalized logistic regression (PLR) technique is usually used in the dimensionality reduction of the high-dimensional gene expression data sets to remove irrelevant and redundant predictors from the binary logistic regression model. One of the regularization techniques used to achieve this goal is the least absolute shrinkage and selection operator (Lasso). However, this technique has been criticized for being biased in the selection of genes. The adaptive Lasso was usually proposed by assigning an initial weight to each gene to address the selection bias. This paper is concerned with adapting PLR to improve its capability in classiï¬cation and gene selection, in the sense of accuracy, by introducing the one-dimensional weighted Mahalanobis distance (1-DWM) for each gene as an initial weight inside L1-norm. By experiments, this proposed method, denoted by adaptive penalized logistic regression (APLR), gives more accurate results compared with other famous methods in this regard.  The proposed method is applied to some real high-dimensional gene expression data sets in order to demonstrate its efficiency in terms of classification accuracy and selection of gene. Therefore, the proposed method could be utilized in other studies implementing gene selection in the area of classification of high dimensional cancer data sets.

X. Y. Liu, Y. Liang, S. Wang, Z. Y. Yang, and H. S. Ye, “A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature Selection,” IEEE Access, vol. 6, pp. 22863-22874, 2018.

K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Comput. Struct. Biotechnol. J., vol. 13, pp. 8-17, 2015.

H. R. Arabnia and Q. N. Tran, Emerging trends in applications and infrastructures for computational biology, bioinformatics, and systems biology: systems and applications. Morgan Kaufmann, 2016.

Z. Y. Algamal and M. H. Lee, “A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification,” Adv. Data Anal. Classif., 2018.

Z. Y. Algamal and M. H. Lee, “Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification,” Expert Syst. Appl., vol. 42, no. 23, pp. 9326-9332, 2015.

Z.-Y. Yang, Y. Liang, H. Zhang, H. Chai, B. Zhang, and C. Peng, “Robust Sparse Logistic Regression With the Lq (0 < q < 1) Regularization for Feature Selection Using Gene Expression Data ZIYI,” IEEE Access, vol. 6, pp. 68586-68595, 2018.

Z. Y. Algamal and H. T. Mohammad Ali, “An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression,” Electron. J. Appl. Stat. Anal., vol. 10, no. 1, pp. 242-256, 2017.

H. H. Hsu, C. W. Hsieh, and M. Da Lu, “Hybrid feature selection by combining filters and wrappers,” Expert Syst. Appl., vol. 38, no. 7, pp. 8144-8150, 2011.

Z. Y. Algamal, “Classification of gene expression autism data based on adaptive penalized logistic regression,” Electron. J. Appl. Stat. Anal., vol. 10, no. 2, pp. 561-571, 2017.

Y. Asar and A. Gení§, “New shrinkage parameters for the Liu-type logistic estimators,” Commun. Stat. Comput., vol. 45, no. 3, pp. 1094-1103, 2016.

D. Inan and B. E. Erdogan, “Liu-type logistic estimator,” Commun. Stat. Comput., vol. 42, no. 7, pp. 1578-1586, 2013.

Y. Liang et al., “Sparse logistic regression with a L 1/2 penalty for gene selection in cancer classification,” BMC Bioinformatics, vol. 14, no. 1, p. 198, 2013.

R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Ser. B, vol. 58, no. 1, pp. 267-288, 1996.

J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., vol. 96, no. 456, pp. 1348-1360, 2001.

H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. Ser. B (statistical Methodol., vol. 67, no. 2, pp. 301-320, 2005.

H. Zou, “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., vol. 101, no. 476, pp. 1418-1429, 2006.

G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, vol. 112. Springer, 2013.

A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55-67, 1970.

S. Wang, B. Nan, S. Rosset, and J. Zhu, “Random lasso,” Ann. Appl. Stat., vol. 5, no. 1, p. 468, 2011.

Bí¼hlmann, Geer, P. and Van De, and Sara, Statistics for High-Dimensional Data: Methods, Theory and Applications. 2011.

J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., vol. 33, no. 1, p. 1, 2010.

H. Peng, Y. Fu, J. Liu, X. Fang, and C. Jiang, “Optimal gene subset selection using the modified SFFS algorithm for tumor classification,” Neural Comput. Appl., vol. 23, no. 6, pp. 1531-1538, 2013.

U. Alon et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci., vol. 96, no. 12, pp. 6745-6750, 1999.

D. Singh et al., “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.

M. A. Shipp et al., “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nat. Med., vol. 8, no. 1, p. 68, 2002.

Q. Shen, Z. Mei, and B. X. Ye, “Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification,” Comput. Biol. Med., vol. 39, no. 7, pp. 646-649, 2009.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).