標題: 運用分散式架構與資料探勘之債券價格預測
Data Mining for Bond Price Prediction Using Apache Spark
作者: 林慧瑜
劉敦仁
蔡銘箴
Lin, Hui-Yu
Liu, Duen-Ren
Tsai, Min-Jen
資訊管理研究所
關鍵字: 預測分析;資料探勘;分散式架構;Predictive Analysis;Data Mining;Apache Spark
公開日期: 2016
摘要: 價格預測一向是金融市場中被熱烈關注的課題,然而無論哪一種金融商品,影響價格波動的因素非常複雜也充滿噪音,使得價格預測難以達到精準。有別於股票,債券的交易缺少透明化、流動性也較低,價格波動影響的因素也相當不同。雖然過去有不少研究是利用資料探勘技術對股票的走勢或未來價格進行預測,卻鮮少是針對債券交易價格進行預測分析。據美國證券與金融市場協會二零一六年五月的統計指出,公司債每天的交易金額均超過三百億美元,可見美國公司債市場之龐大。 本研究使用的資料集為Kaggle的競賽資料,內容包含了762,678筆債券交易和每筆交易之基本資訊及其過去十筆交易的歷史資訊。本研究運用資料探勘方法建立債券的價格預測模型,且利用Hadoop佈署分散式系統,在Apache Spark的運算框架上針對大量資料作有效率地運算。價格預測模型的建立分為三個階段:第一階段為前處理,首先利用技術指標進行屬性擴充,再利用降維的方法篩選屬性,擷取出較具有影響力的變數;第二階段中,我們利用線性迴歸、隨機森林和梯度提升決策樹等機器學習演算法,針對不同類別的交易分別建立價格預測模型;而第三階段則是利用前階段訓練出的模型預測測試集資料的債券交易價格,最後以平均絕對誤差和均方根誤差評估及比較各個預測模型之表現。
Price prediction has long been one of the most intriguing and wildly studied topic in financial market. However, the factors that affect the fluctuation in prices for any financial instrument are complicated and usually involve plenty of noise, making prediction of future prices difficult. Unlike stocks, most bonds do not trade on exchanges. Consequently, the bond market usually lacks of transparency and liquidity. And yet according to SIFMA, the average daily trading volume of corporate bond in 2016 is more than 30 billion dollars. Despite the fact that bond market is enormous, merely any study in regards to bond price prediction was made in the past. Using data mining techniques, this paper proposes an approach to build bond price predictive models and improve the computing efficiency by applying Spark framework on top of a Hadoop cluster. Data used in this research is a competition dataset from Kaggle and containing 762,678 corporate bond transactions. Our predictive model is constructed in three phases. Firstly, we expand the feature set by transforming the original price time series into a set of technical indicators, and the number of feature is further reduced by applying dimensionality reduction methods. Secondly, machine learning algorithms are employed to build predictive models. Finally, the prediction results from different models are compared by evaluating their MAE and RMSE.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070353411
http://hdl.handle.net/11536/138622
Appears in Collections:Thesis