標題: 高效能雲端儲存管理策略之研究
A Management Strategy of Replica Compression for Hadoop
作者: 郭文俊
Guo, Wen-Jun
蔡文能
Tsai, Wen-Nung
資訊科學與工程研究所
關鍵字: 雲端;儲存;HDFS;Cloud;Storage;HDFS
公開日期: 2010
摘要: 我們提出一個雲端儲存副本管理策略,其利用少量儲存空間來保持資料可靠性。 為了儲存海量的資料,雲端儲存系統通常使用分散式檔案系統當作它的後端儲存系統。 自Google提出GFS(Google File System)後,Apache軟體基金會開發了一個 Hadoop Distributed File System (HDFS),其是一個提供Map/Reduce框架計算的開放源碼專案。為了改善系統可用性,檔案系統的架構採用資料複製和機櫃感知的方法,當有節點掛掉時,至少仍可保持一份的資料副本在系統中。然而,有兩個重要的問題需被考量,包含如何放置副本,以及如何減少因為為了維持可用性所使用的資料複製而帶來的大量空間消耗。 在本篇論文中,我們提出一個副本管理策略來確保資料可用性以及減少空間消耗。我們的方法根據自動地偵測網路拓墣,求出每個副本的適當位置。至於空間消耗的問題,我們壓縮額外的副本在背景運作中,使得系統增加更多可用空間。我們實作此策略在HDFS中。傑管顯示此策略可以有效減少儲存空間,而且不影響系統效能。
We propose a replica management strategy for cloud computing, which consumes minimum cost to keep data reliability for cloud storage. To store massive data, cloud storage usually adopts distributed file system as its backend due to the concern of scalability. With the GFS (Google File System) proposed by Google, the Apache Software Foundation developed HDFS (Hadoop Distribute File System), which is a primary storage system for computing with Map/Reduce framework. To improve system availability, the file system architecture adopts data replication and rack-awareness to preserve at least one copy when node crashes. However, two important issues should be considered. These include how to place the replicas, and how to reduce the large space consumption comes with duplicate replicas for the data availability. In this thesis, we present a replica management strategy to ensure the data availability and to reduce the space consumption. Our method determines proper places of each replica according to the network topology which is detected automatically. As for the problem of space consumption, we compress the additional copies of data in background. This makes the system increase more available space. We implemented the strategy on HDFS. The results show that the proposed strategy can effectively reduce the storage space, and will not affect system performance.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079855565
http://hdl.handle.net/11536/48299
顯示於類別:畢業論文


文件中的檔案:

  1. 556501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。