完整後設資料紀錄
DC 欄位語言
dc.contributor.author古展易en_US
dc.contributor.authorKu, Chan-Ien_US
dc.contributor.author袁賢銘en_US
dc.contributor.authorYuan, Shyan-Mingen_US
dc.date.accessioned2015-11-26T01:04:48Z-
dc.date.available2015-11-26T01:04:48Z-
dc.date.issued2013en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT079979539en_US
dc.identifier.urihttp://hdl.handle.net/11536/50970-
dc.description.abstractHadoop Distributed File System (HDFS)被運用在解決大量的資料儲存問題,但是並未提供對重複檔案的處理機制,此研究以HBASE架構虛擬中介層檔案系統(Middle layer file system),在HDFS中達到File Deduplication的功能,依照應用需求的可靠度要求不同提出兩種架構,一者為不許可有任何錯誤的 RFD-HDFS(Reliable File Deduplicated HDFS),另一者為可容忍極少錯誤的FD-HDFS (File Deduplicated HDFS)兩種解決方案,除了空間複雜度上的優勢,也探討比較其帶來之邊際效益。 假設一個內容完全相同的熱門影片被一百萬個用戶上傳到HDFS,經過Hadoop replication成三百萬個檔案來儲存,這是非常浪費磁碟空間的做法,唯有雲端除去重複才能有效裝載,經此將只占用3個檔案空間,也就是達成百分百去除重複檔案的效用。 實驗架構為一個雲端文獻系統,類似EndNote Cloud版,模擬研究生將資料與雲端同步時,與海量數據庫的群聚效應。zh_TW
dc.description.abstractThe Hadoop Distributed File System (HDFS) is used to solve the storage problem of huge data, but does not provide a handling mechanism of duplicate files. In this study, the middle layer file system in the HBASE virtual architecture is used to do File Deduplicate in HDFS, with two architectures proposed according to different requires of the applied requirement reliability, therein one is RFD-HDFS (Reliable File Deduplicated HDFS) which is not permitted to have any errors and the other is FD-HDFS (File Deduplicated HDFS) which can tolerate very few errors. In addition to the advantage of the space complexity, the marginal benefits from it are explored. Assuming a popular video is uploaded to HDFS by one million users, through the Hadoop replication, they are divided into three million files to store, that is a practice wasting disk space very much and only by the cloud to remove repeats for effectively loading. By that, only three file spaces are taken up, namely the 100% utility of removing duplicate files reaches. The experimental architecture is a cloud based documentation system, like the version of EndNote Cloud, to simulate the cluster effect of massive database when the researcher synchronized the data with cloud storage.en_US
dc.language.isoen_USen_US
dc.subject去重複zh_TW
dc.subjectHadoopzh_TW
dc.subject雲端運算zh_TW
dc.subjectHDFSzh_TW
dc.subject雲端儲存zh_TW
dc.subjectSingle instance storagezh_TW
dc.subjectDeduplicationen_US
dc.subjectHadoopen_US
dc.subjectCloud Computingen_US
dc.subjectHDFSen_US
dc.subjectCloud Storageen_US
dc.subjectSingle instance storageen_US
dc.title雲端儲存的檔案去重複zh_TW
dc.titleFile Deduplication with Cloud Storage File Systemen_US
dc.typeThesisen_US
dc.contributor.department資訊學院資訊學程zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 953901.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。