Title: 基於Hadoop平台之物品推薦效能比較分析
Comparative Analysis of the Performance of Item Recommendations based on Hadoop Platforms
Authors: 張勝雄
Keywords: 分散式;效能比較;Distribution;Performance
Issue Date: 2013
Abstract: 在網路資訊快速成長的時代,隨著使用的載具多元化,消費者在網路留下的瀏覽記錄數量也是呈現大幅度的成長,在大量的資料記錄累積下,尋找執行效能比較合適的解決方案,分析工具的選擇也成為一個重要的議題。 目前市面上知名的開放原始碼分析工具有R與Mahout,它們皆能搭配Hadoop平台進行運作。本研究主要是透過實驗的方式來了解,兩工具在推薦演算法執行效能上的差異。研究結果發現,以wordcount之實驗,使用Hadoop-streaming搭配R編寫之程式進行測試所得到的執行效能與使用Hadoop native程式的執行效能差不多;但在有矩陣運算時,Mahout與R兩者在記憶體的支配上就出現明顯差異。本研究對於Mahout及R兩者的執行效能測試結果分析,可提供欲使用Hadoop搭配Mahout及R進行運算之使用者參考。
In recent years, the number of browsing tools, customers, and the amount of data have grown exponentially. How to select data analysis tool is an important issue for decision-maker. The two popular open sources of analysis tool are R and Mahout. Both of them can operate on Hadoop platform. The purpose of this research is to understand the recommendation algorithm operation performance with R and Mahout. By conducting several experiments. The experiment results of wordcount data analysis show that Hadoop-streaming with R can get the same performance with Hadoop native. For the experiments involving matrix calculations, R requires larger memory capacity than Mahout.
Appears in Collections:Thesis