Title: An implementation of using remote memory to checkpoint processes
Authors: Hsu, ST
Chang, RC
資訊工程學系
Department of Computer Science
Keywords: fault tolerance;remote memory;checkpoint
Issue Date: 1-Sep-1999
Abstract: Process checkpointing is a procedure which periodically saves the process states into stable storage. Most checkpointing facilities select hard disks for archiving. However, the disk seek time is limited by the speed of the read-write heads, thus checkpointing process into a local disk requires extensive disk bandwidth. In this paper, we propose an approach that exploits the memory on idle workstations as a faster storage for checkpointing. In our scheme, autonomous machines which submit jobs to the computation server offer their physical memory to the server for job checkpointing. Eight applications are used to measure the remote memory performance in four checkpointing policies. Experimental results show that remote memory reduces at least 34.5 per cent of the overhead for sequential checkpointing and 32.1 per cent for incremental checkpointing. Additionally, to checkpoint a running process into a remote memory requires only 60 per cent of the local disk checkpoint latency time. Copyright (C) 1999 John Wiley & Sons, Ltd.
URI: http://hdl.handle.net/11536/31107
http://dx.doi.org/10.1002/(SICI)1097-024X(199909)29:11<985
ISSN: 0038-0644
DOI: 10.1002/(SICI)1097-024X(199909)29:11<985
Journal: SOFTWARE-PRACTICE & EXPERIENCE
Volume: 29
Issue: 11
Begin Page: 985
End Page: 1004
Appears in Collections:Articles


Files in This Item:

  1. 000082691100005.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.