標題: 利用頻繁項目集概念對基因表現做雙分群分析
Gene Expression Biclustering Based on Frequent Item Set Concept
作者: 胡毓志
HU YUH-JYH
國立交通大學資訊工程學系(所)
關鍵字: 生物晶片;基因表現;雙分群;biochips;gene expressions;biclustering
公開日期: 2009
摘要: 自從生物晶片技術的成熟發展,以及晶片價格的下降之後,有越來越多的研究單位或實驗室能夠負擔生物晶片微矩陣的相關儀器,因此,基因表現實驗數據的產生日益增多,基因表現資料的取得不再困難,其衍生而來的問題反倒是如何從大量的表現資料當中挖掘在生物上有意義的資訊,而從基因表現數據裏找出有意義的基因群組,長久以來一直都是分析微矩陣資料的一個重要課題。由於傳統分群演算法在先天上的一些限制,以致許多雙分群演算法的發展,用以解決此傳統分群法之不足。有別於目前已被提出的雙分群法,我們基於頻繁項目集的分析架構下提出一個新的雙分群法,換言之,我們把微矩陣資料的雙分群問題轉換為挖掘頻繁項目集的問題。為了驗證我們演算法可行性,我們計畫首先探討各個雙分群系統之優劣,藉以作為我們新系統建構的參考,此外,為評估系統的實用性,我們也將蒐集已知物種的基因表現資料,作為系統測試與分析的材料。接著,我們也計畫和近年來所發表的幾個雙分群系統做比較,透過已知且公開的資料下,進行一連串的系統測試,以期了解新系統之效能。此計畫有兩個主要目標,其一是提出新的雙分群演算法,並與其他已發表的方法做系統化的比較,我們預期以新的系統發掘更具生物意義的基因群組,第二是我們希望為資料探勘研究開闢另一個跨領域課題,同時也為生物科學注入研究新力。
As the advance of biochip technology as well as the price drop of gene chips, more and more research groups and labs are now able to afford biochip/microarray systems. Thus, the problem the biologists are faced with is no longer data acquisition, but rather data analysis. To obtain genes with similar expression behavior so as to build gene families, most researchers adapt clustering strategies to identify functionally related gene groups. However, those conventional clustering algorithms generally assume “global similarity,” i.e., all the sampling conditions are based on to determine similarity. Unfortunately, this assumption does not correctly reflect the real-world biological meanings. In real biological systems, not all genes are involved in all biological activities at all times. To mitigate the limitation, we instead adapt biclustering strategies, which seek “local similarity.” To be precise, we aim to find subsets of genes that carry similar expression profiles across subsets of experiment conditions or time points. We plan to perform a thorough survey of current biclustering methods. Based on the pros and cons of those previous algorithms, we expect to design a new and better biclustering system. Unlike earlier approaches, we will transform the biclustering problem into a frequent pattern/itemset finding problem. By applying data mining techniques, we intend to improve the biological significance of the biclusters found. In addition, to make a consistent and fair comparison with other tools, we plan to first collect real-world gene expression datasets, and then test them on our new system against other current approaches based on widely-accepted cluster quality measures. There are two objectives in this project. First, we plan to develop a new biclustering method and compare it with other current systems based on real-world expression datasets. Second, Unlike earlier approaches, our new clustering tool will adapt some ideas in the data mining community. Along with this, we hope to open a new research direction for the data mining arena, and bring new influx of efforts into biological sciences.
官方說明文件#: NSC98-2221-E009-150
URI: http://hdl.handle.net/11536/101836
https://www.grb.gov.tw/search/planDetail?id=1905992&docId=315914
顯示於類別:研究計畫