標題: Hadoop分散式資料儲存與計算環境的架構分析
Analysis of Hadoop Distributed Environment for Data Storage and Data Computing
作者: 王耀駿
張文鐘
電信工程研究所
關鍵字: 分散式檔案系統;分散式處理;雲端;HDFS;MapReduce;Hadoop
公開日期: 2011
摘要: 本論文是以HDFS(Hadoop Distributed File System)分散式檔案系統與Hadoop MapReduce分散式處理軟體框架之實現方式與設計策略做為探討的重點。分散式檔案系統,是一種可讓多台機器透過網路分享檔案和儲存空間的檔案系統;分散式處理框架則是集合分散的計算資源,以並行性平行處理的方式解決大型計算問題。HDFS與MapReduce運行在同一個伺服器叢集上;HDFS將資料存放在伺服器叢集中,由各個伺服器共同合作,提供整個檔案系統的服務,而MapReduce分散式處理框架則會對存放在HDFS上的資料進行分散式的平行運算。HDFS與MapReduce皆採用master/slave架構,整個叢集由一個master伺服器與多個slave伺服器所組成;slave提供儲存與運算資源,而master負責集中管理這些儲存與運算資源,並對使用者的要求做回應。其中不同身分的機器間(master、slave與客戶端)的交流都必須透過建立在TCP/IP之上的RPC機制來進行訊息的交換,以達到機器間的溝通;至於彼此之間檔案傳輸的部分,用的是串流傳輸的機制。HDFS的MapReduce運作機制會藉由閱讀相關的開放式原始碼來了解,此外會再藉由伺服器叢集的架設並實際執行自行撰寫的應用程式,來對此分散式儲存與處理的架構做進一步的了解與應用。
The primary issue of this thesis is the architecture of HDFS(Hadoop Distributed File System)and Hadoop MapReduce software framework for distributed computing. Distributed file system is a file system that allowed many computers to share their files and storage spaces through the network, and distributed computing is a way to solve large computational problems in parallel by collecting distributed computing resources. HDFS and MapReduce framework are running on the same computer cluster, HDFS provide file system service and store large data sets in disks of computer cluster, while MapReduce applications process large data sets stored in HDFS in-parallel on large cluster. Both HDFS and MapReduce framework follow master/slave architecture, a cluster consists of a single master server and many slave servers. The master server is responsible for managing and coordinating the storage and computing resource provided by slave servers in cluster to serve the requests from clients. All servers are fully connected and communicate with each other by using TCP-based protocols and streaming mechanism. The mechanism of HDFS and MapReduce framework would be verified through the studying of relative open source code, and a computer cluster would be set up to further clarify the operations.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079813524
http://hdl.handle.net/11536/47010
Appears in Collections:Thesis