標題: 基於Hadoop架構之網站日誌分析系統
Design and Implementation of Weblog Analysis System with Hadoop
作者: 王辰豪
Wang, Chen-Hao
袁賢銘
Yuan, Shyan-Ming
資訊學院資訊學程
關鍵字: Hadoop;分散式運算;雲端運算;巨量資料;Hadoop;Distributed Computing;Could Computing;Big Data
公開日期: 2013
摘要: 近年來,雲端運算是相當熱門的研究及應用議題,雲端運算使用分散式儲存空間與分散式運算技術,來達到大量儲存資料,以及快速資料分析處理。由於網際網路技術快速發展,數位資料呈現爆炸性的成長,面對海量資料的處理,傳統的文字軟體及關連式資料庫已面臨技術上瓶頸,呈現的結果並不是非常理想。針對此一問題,雲端運算概念是較為合適的選擇。 本研究基於Hadoop架構搭配HDFS(Hadoop Distributed File System)分散式檔案系統、MapReduce分散式處理軟體框架以及Pig程式語言,設計與實作一套企業內部Weblog分析系統。實作結果,藉由分析每日Weblog記錄,我們得到Application Server流量的趨勢圖、程式效能統計報表以及依需求提供不同區間和程式功能的效能統計報告。此系統主要目的為協助系統管理人員快速地擷取及分析隱藏在海量資料中的潛在價值,進而提供企業決策時的重要依據。
In recent years, cloud computing has been a topic issue in the field of research. Cloud computing using distributed storage and distributed computing technology to achieve a large number of stored data, as well as fast data analysis and processing. As the rapid development of Internet technology, digital data showing explosive growth, the face of massive data processing, the traditional text software and relational database technology has been facing a bottleneck, presented the results are not very satisfactory. For this problem, the concept of cloud computing is a more appropriate choice. In this thesis, based on the architecture of Hadoop with HDFS(Hadoop Distributed File System) and Hadoop MapReduce software framework and Pig Latin language, we design and implement an enterprise Weblog analysis system. Experimental results, by analyzing daily Weblog records, we get Application Server traffic trends, performance of program statistical reports, and performance reports of different intervals and different actions of program by user request. The main purpose of this system is to assist system administrators to quickly capture and analyze data hidden in the massive potential value, thus providing an important basis for business decisions.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079979520
http://hdl.handle.net/11536/73032
顯示於類別:畢業論文