标题: | 高效能云端储存管理策略之研究 A Management Strategy of Replica Compression for Hadoop |
作者: | 郭文俊 Guo, Wen-Jun 蔡文能 Tsai, Wen-Nung 资讯科学与工程研究所 |
关键字: | 云端;储存;HDFS;Cloud;Storage;HDFS |
公开日期: | 2010 |
摘要: | 我们提出一个云端储存副本管理策略,其利用少量储存空间来保持资料可靠性。 为了储存海量的资料,云端储存系统通常使用分散式档案系统当作它的后端储存系统。 自Google提出GFS(Google File System)后,Apache软体基金会开发了一个 Hadoop Distributed File System (HDFS),其是一个提供Map/Reduce框架计算的开放源码专案。为了改善系统可用性,档案系统的架构采用资料复制和机柜感知的方法,当有节点挂掉时,至少仍可保持一份的资料副本在系统中。然而,有两个重要的问题需被考量,包含如何放置副本,以及如何减少因为为了维持可用性所使用的资料复制而带来的大量空间消耗。 在本篇论文中,我们提出一个副本管理策略来确保资料可用性以及减少空间消耗。我们的方法根据自动地侦测网路拓墣,求出每个副本的适当位置。至于空间消耗的问题,我们压缩额外的副本在背景运作中,使得系统增加更多可用空间。我们实作此策略在HDFS中。杰管显示此策略可以有效减少储存空间,而且不影响系统效能。 We propose a replica management strategy for cloud computing, which consumes minimum cost to keep data reliability for cloud storage. To store massive data, cloud storage usually adopts distributed file system as its backend due to the concern of scalability. With the GFS (Google File System) proposed by Google, the Apache Software Foundation developed HDFS (Hadoop Distribute File System), which is a primary storage system for computing with Map/Reduce framework. To improve system availability, the file system architecture adopts data replication and rack-awareness to preserve at least one copy when node crashes. However, two important issues should be considered. These include how to place the replicas, and how to reduce the large space consumption comes with duplicate replicas for the data availability. In this thesis, we present a replica management strategy to ensure the data availability and to reduce the space consumption. Our method determines proper places of each replica according to the network topology which is detected automatically. As for the problem of space consumption, we compress the additional copies of data in background. This makes the system increase more available space. We implemented the strategy on HDFS. The results show that the proposed strategy can effectively reduce the storage space, and will not affect system performance. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079855565 http://hdl.handle.net/11536/48299 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.