雲端儲存的檔案去重複

Full metadata record

DC Field	Value	Language
dc.contributor.author	古展易	en_US
dc.contributor.author	Ku, Chan-I	en_US
dc.contributor.author	袁賢銘	en_US
dc.contributor.author	Yuan, Shyan-Ming	en_US
dc.date.accessioned	2015-11-26T01:04:48Z	-
dc.date.available	2015-11-26T01:04:48Z	-
dc.date.issued	2013	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT079979539	en_US
dc.identifier.uri	http://hdl.handle.net/11536/50970	-
dc.description.abstract	Hadoop Distributed File System (HDFS)被運用在解決大量的資料儲存問題，但是並未提供對重複檔案的處理機制，此研究以HBASE架構虛擬中介層檔案系統(Middle layer file system)，在HDFS中達到File Deduplication的功能，依照應用需求的可靠度要求不同提出兩種架構，一者為不許可有任何錯誤的 RFD-HDFS(Reliable File Deduplicated HDFS)，另一者為可容忍極少錯誤的FD-HDFS (File Deduplicated HDFS)兩種解決方案，除了空間複雜度上的優勢，也探討比較其帶來之邊際效益。假設一個內容完全相同的熱門影片被一百萬個用戶上傳到HDFS，經過Hadoop replication成三百萬個檔案來儲存，這是非常浪費磁碟空間的做法，唯有雲端除去重複才能有效裝載，經此將只占用3個檔案空間，也就是達成百分百去除重複檔案的效用。實驗架構為一個雲端文獻系統，類似EndNote Cloud版，模擬研究生將資料與雲端同步時，與海量數據庫的群聚效應。	zh_TW
dc.description.abstract	The Hadoop Distributed File System (HDFS) is used to solve the storage problem of huge data, but does not provide a handling mechanism of duplicate files. In this study, the middle layer file system in the HBASE virtual architecture is used to do File Deduplicate in HDFS, with two architectures proposed according to different requires of the applied requirement reliability, therein one is RFD-HDFS (Reliable File Deduplicated HDFS) which is not permitted to have any errors and the other is FD-HDFS (File Deduplicated HDFS) which can tolerate very few errors. In addition to the advantage of the space complexity, the marginal benefits from it are explored. Assuming a popular video is uploaded to HDFS by one million users, through the Hadoop replication, they are divided into three million files to store, that is a practice wasting disk space very much and only by the cloud to remove repeats for effectively loading. By that, only three file spaces are taken up, namely the 100% utility of removing duplicate files reaches. The experimental architecture is a cloud based documentation system, like the version of EndNote Cloud, to simulate the cluster effect of massive database when the researcher synchronized the data with cloud storage.	en_US
dc.language.iso	en_US	en_US
dc.subject	去重複	zh_TW
dc.subject	Hadoop	zh_TW
dc.subject	雲端運算	zh_TW
dc.subject	HDFS	zh_TW
dc.subject	雲端儲存	zh_TW
dc.subject	Single instance storage	zh_TW
dc.subject	Deduplication	en_US
dc.subject	Hadoop	en_US
dc.subject	Cloud Computing	en_US
dc.subject	HDFS	en_US
dc.subject	Cloud Storage	en_US
dc.subject	Single instance storage	en_US
dc.title	雲端儲存的檔案去重複	zh_TW
dc.title	File Deduplication with Cloud Storage File System	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊學院資訊學程	zh_TW
Appears in Collections:	Thesis

Files in This Item:

953901.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.