標題: 雲端儲存的檔案去重複
File Deduplication with Cloud Storage File System
作者: 古展易
Ku, Chan-I
袁賢銘
Yuan, Shyan-Ming
資訊學院資訊學程
關鍵字: 去重複;Hadoop;雲端運算;HDFS;雲端儲存;Single instance storage;Deduplication;Hadoop;Cloud Computing;HDFS;Cloud Storage;Single instance storage
公開日期: 2013
摘要: Hadoop Distributed File System (HDFS)被運用在解決大量的資料儲存問題,但是並未提供對重複檔案的處理機制,此研究以HBASE架構虛擬中介層檔案系統(Middle layer file system),在HDFS中達到File Deduplication的功能,依照應用需求的可靠度要求不同提出兩種架構,一者為不許可有任何錯誤的 RFD-HDFS(Reliable File Deduplicated HDFS),另一者為可容忍極少錯誤的FD-HDFS (File Deduplicated HDFS)兩種解決方案,除了空間複雜度上的優勢,也探討比較其帶來之邊際效益。 假設一個內容完全相同的熱門影片被一百萬個用戶上傳到HDFS,經過Hadoop replication成三百萬個檔案來儲存,這是非常浪費磁碟空間的做法,唯有雲端除去重複才能有效裝載,經此將只占用3個檔案空間,也就是達成百分百去除重複檔案的效用。 實驗架構為一個雲端文獻系統,類似EndNote Cloud版,模擬研究生將資料與雲端同步時,與海量數據庫的群聚效應。
The Hadoop Distributed File System (HDFS) is used to solve the storage problem of huge data, but does not provide a handling mechanism of duplicate files. In this study, the middle layer file system in the HBASE virtual architecture is used to do File Deduplicate in HDFS, with two architectures proposed according to different requires of the applied requirement reliability, therein one is RFD-HDFS (Reliable File Deduplicated HDFS) which is not permitted to have any errors and the other is FD-HDFS (File Deduplicated HDFS) which can tolerate very few errors. In addition to the advantage of the space complexity, the marginal benefits from it are explored. Assuming a popular video is uploaded to HDFS by one million users, through the Hadoop replication, they are divided into three million files to store, that is a practice wasting disk space very much and only by the cloud to remove repeats for effectively loading. By that, only three file spaces are taken up, namely the 100% utility of removing duplicate files reaches. The experimental architecture is a cloud based documentation system, like the version of EndNote Cloud, to simulate the cluster effect of massive database when the researcher synchronized the data with cloud storage.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079979539
http://hdl.handle.net/11536/50970
顯示於類別:畢業論文


文件中的檔案:

  1. 953901.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。