標題: | Discovery of prognostic biomarkers for predicting lung cancer metastasis using microarray and survival data |
作者: | Huang, Hui-Ling Wu, Yu-Chung Su, Li-Jen Huang, Yun-Ju Charoenkwan, Phasit Chen, Wen-Liang Lee, Hua-Chin Chu, William Cheng-Chung Ho, Shinn-Ying 生物科技學系 生物資訊及系統生物研究所 分子醫學與生物工程研究所 Department of Biological Science and Technology Institude of Bioinformatics and Systems Biology Institute of Molecular Medicine and Bioengineering |
關鍵字: | Distant metastasis;Genetic algorithm;Lung cancer;Microarray;Prognostic biomarker;Survival curve |
公開日期: | 21-Feb-2015 |
摘要: | Background: Few studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining. Recently obtained evidence reveals that epithelial-mesenchymal transition (EMT) of tumor cells causes metastasis, which is detrimental to patients' survival. Results: This work proposes a novel optimization approach to discovering EMT-related prognostic biomarkers to predict the distant metastasis of lung cancer using both microarray and survival data. This weighted objective function maximizes both the accuracy of prediction of distant metastasis and the area between the disease-free survival curves of the non-distant and distant metastases. Seventy-eight patients with lung cancer and a follow-up time of 120 months are used to identify a set of gene markers and an independent cohort of 26 patients is used to evaluate the identified biomarkers. The medical records of the 78 patients show a significant difference between the disease-free survival times of the 37 non-distant-and the 41 distant-metastasis patients. The experimental results thus obtained are as follows. 1) The use of disease-free survival curves can compensate for the shortcoming of insufficient samples and greatly increase the test accuracy by 11.10%; and 2) the support vector machine with a set of 17 transcripts, such as CCL16 and CDKN2AIP, can yield a leave-one-out cross-validation accuracy of 93.59%, a test accuracy of 76.92%, a large disease-free survival area of 74.81%, and a mean survival prediction error of 3.99 months. The identified putative biomarkers are examined using related studies and signaling pathways to reveal the potential effectiveness of the biomarkers in prospective confirmatory studies. Conclusions: The proposed new optimization approach to identifying prognostic biomarkers by combining multiple sources of data (microarray and survival) can facilitate the accurate selection of biomarkers that are most relevant to the disease while solving the problem of insufficient samples. |
URI: | http://dx.doi.org/10.1186/s12859-015-0463-x http://hdl.handle.net/11536/124569 |
ISSN: | 1471-2105 |
DOI: | 10.1186/s12859-015-0463-x |
期刊: | BMC BIOINFORMATICS |
Volume: | 16 |
起始頁: | 0 |
結束頁: | 0 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.