File Deduplication with Cloud Storage File System

doi:10.1109/CSE.2013.52

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ku, Chan-I	en_US
dc.contributor.author	Luo, Guo-Heng	en_US
dc.contributor.author	Chang, Che-Pin	en_US
dc.contributor.author	Yuan, Shyan-Ming	en_US
dc.date.accessioned	2015-07-21T08:31:00Z	-
dc.date.available	2015-07-21T08:31:00Z	-
dc.date.issued	2013-01-01	en_US
dc.identifier.issn	1949-0828	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/CSE.2013.52	en_US
dc.identifier.uri	http://hdl.handle.net/11536/125062	-
dc.description.abstract	The Hadoop Distributed File System (HDFS) is used to solve the storage problem of huge data, but does not provide a handling mechanism of duplicate files. In this study, the middle layer file system in the HBASE virtual architecture is used to do File Deduplicate in HDFS, with two architectures proposed according to different requires of the applied requirement reliability, therein one is RFD-HDFS (Reliable File Deduplicated HDFS) which is not permitted to have any errors and the other is FD-HDFS (File Deduplicated HDFS) which can tolerate very few errors. In addition to the advantage of the space complexity, the marginal benefits from it are explored. Assuming a popular video is uploaded to HDFS by one million users, through the Hadoop replication, they are divided into three million files to store, that is a practice wasting disk space very much and only by the cloud to remove repeats for effectively loading. By that, only three file spaces are taken up, namely the 100% utility of removing duplicate files reaches. The experimental architecture is a cloud based documentation system, like the version of EndNote Cloud, to simulate the cluster effect of massive database when the researcher synchronized the data with cloud storage.	en_US
dc.language.iso	en_US	en_US
dc.subject	HDFS	en_US
dc.subject	Data Deduplication	en_US
dc.subject	Cloud Computing	en_US
dc.subject	Single instance storage	en_US
dc.title	File Deduplication with Cloud Storage File System	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.doi	10.1109/CSE.2013.52	en_US
dc.identifier.journal	2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013)	en_US
dc.citation.spage	280	en_US
dc.citation.epage	287	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000351950300042	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Conferences Paper