標題: 應用資料採礦技術於程式原始碼錯誤預測 - 以消費性電子產品專案為例
Applying Data Mining to Source Code Fault Prediction: An Example of Consumer Electronics Projects
作者: 林鴻儒
Lin, Hung-Ru
林君信
Lin, Chiun-Sin
管理學院管理科學學程
關鍵字: 資料採礦;原始碼錯誤預測;原始碼度量資訊;軟體專案管理;Data Mining;Source Code Fault Prediction;Source Code Metrics;Software Project Management
公開日期: 2010
摘要: 近年來電子資訊產業的兩大趨勢:以產品差異化為目的之軟體加值需求增加與消費性電子產品生命週期大幅縮短,促使軟體專案管理者必須以更精確的資源管理手法來達成績效目標。先前研究指出,原始碼錯誤預測可以提供管理者相關資訊,作為資源配置優先權的參考。 本論文應用多種資料採礦演算法實作出原始碼錯誤預測系統,實驗結果顯示,以外部資料作為訓練集,對於建置歷史資料不足的消費性電子產品軟體專案錯誤預測系統具實用性。採礦演算法中以羅吉斯迴歸演算法表現相對較佳,平均可在偵測19.1%的程式模組後找到約84.6%的錯誤,成本效益達77.4%。由採礦模型得知,原始碼度量資訊中LOC比Halstead與McCabe提供更多有效資訊。 本研究建議,系統建置上若同時考慮系統運算資源與軟體失效處理成本,可使用貝氏機率分類演算法;針對研發成本偏高之產業則可採用類神經網路演算法或群集演算法。另外,專案管理者對於衍生型軟體專案必須注意模組化的問題,否則將影響預測系統之準確性。
In recent years, as a result of the rise in demand for value-added software and the life cycle reduction of consumer electronics, software managers need to prioritize resources more accurately to achieve performance goals. Previous study shows that source code fault prediction can provide information for allocating resources accordingly. In this thesis, we implement fault prediction system by using several data mining algorithms, the experimental results show that cross-company training set gives a practical result for consumer electronics projects which lack of enough with-in company historical data. Logistic regression algorithm is relatively better, 84.6% of the defects can be detected by inspecting 19.1% of the code, which reaches approximately 77.4% cost-benefit. The mining models indicate that LOC metrics is more informative than Halstead and McCabe. Our study suggests that Naive Bayes algorithm is suitable for projects with both computing resources and failure cost considerations. Neural network and clustering algorithm are adaptive in the industry with higher R&D cost. In addition, software managers must pay attention to the modularization issue for derivative projects, otherwise, it will affect the precision of the prediction system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079562522
http://hdl.handle.net/11536/41457
Appears in Collections:Thesis