標題: 一個針對動態資料驅動應用系統概念飄移的平行偵測與預測方法
A Parallel Detection and Prediction Method for Concept Drift in Dynamic Data Driven Application System
作者: 邱耀慶
Chiu, Yao-Ching
羅濟群
黃興進
Lo, Chi-Chun
Hwang, Hsin-Ginn
資訊管理研究所
關鍵字: 大資料;動態資料驅動系統;概念飄移;機器學習;Big Data;Dynamic-Data-Driven-Application System;Concept Drift;machine Learning;Map-Reduce
公開日期: 2015
摘要: 傳統的資料分析與預測方法,其預測模型都假設資料是穩定分佈的,所以藉由參照歷史資料、學習資料之間的關係,能夠很準確地預測(分類)尚未標記的資料的標記。然而,在今天多變性的大資料環境下,預測模型因為太過於依賴歷史的資料,而無法正確地推測出隨著情境而改變的資料關聯性的現象(概念飄移)。本研究提出一個針對動態資料驅動應用系統概念飄移的平行偵測與預測方法。所提出的方法快速偵測資料概念的改變,並即時的將概念飄移回饋給系統,進而調整預測模型來提高即時預測的準確率。同時,我們利用平行運算,透過區域性預測來計算出全域性預測,有效的提高預測準確率,、並減少了整體運算的時間。我們利用Map-Reduce的分散式平臺和分類演算法來實作。結果顯示,在兩個實驗案例中,平均預測的準確率較以往的預測方法分別提升了 14% 和 35%;在運算效能部分,較傳統計算方式分別節省了近 45% 和 29% 的時間。
The traditional data analysis and prediction method assumes that data distribution is stable. Therefore, it can predict unlabeled data precisely by analyzing the historical data. However, in today’s big-data environment, which is changing frequently, the traditional approach can no longer be effective; it cannot handle concept drift in a Dynamic Data Driven Application System (DDDAS). This thesis proposes a parallel detection and prediction method for concept drift in DDDAS. The proposed method can detect changing data and then feedback to the prediction model for better subsequent predictions. Furthermore, this method computes a global prediction by aggregating local predictions. Therefore, prediction accuracy is increased and computation time is decreased. In simulation, Map-Reduce is used for parallel processing. Two cases are tested. Results show that prediction accuracy is raised by 14% and 35% for these two cases, respectively. The execution time is improved by almost 45% and 29%, respectively.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070253433
http://hdl.handle.net/11536/126045
顯示於類別:畢業論文