標題: 作業系統支援之檢查點建立技術
Operating System Supports for Process Checkpointing
作者: 徐尚德
Shang-te Hsu
張瑞川
Ruei-Chuan Chang
資訊科學與工程研究所
關鍵字: 容錯系統;檢查點;Fault-Tolerant System;Checkpoint;Continuous Checkpointing;Remote Memory Checkpointing
公開日期: 1998
摘要: 在建立檢查點時所要儲存的資料量決定建立檢查點所需要的時間與代 價。目前的研究報告指出可以利用漸進方式建立檢查點、主記憶體方 式建立檢查點或是利用寫入時才複製技術來減少所需付出的代價。這 些技術都成功的改進了建立檢查點的效能。但是在較新的作業系統中 ,程式的記憶體內容可能在建立檢查點之前移出。如果在建立檢查點 時忽略這種記憶體代換現象可能會增加所需要的磁碟存取 — 所有移 出的記憶體資料都必須先還原到記憶體內,再建立檢查點。在本論文 的第一部份,我們將提出永續檢查點的方法。這個方式將檢查點與虛 擬記憶體結合。建立檢查點不再是一項定時的工作,而變成是程式執 行過程中持續性的工作。實驗結果顯示在對一個所有節點對之最短路 徑問題建立檢查點時,可以避免約 80% 的磁碟存取。 其次,我們提出利用遠端記憶體做為儲存檢查點的虛擬磁碟機。遠端 記憶體是在其他機器上的記憶體之集合。利用遠端記憶體儲存檢查點 不但避免建立檢查點時的真實磁碟存取,同時也提升網路上資源的有 效利用率。實驗結果顯示利用遠端記憶體做循序式的檢查點可以減少 34.5% 的代價,在漸進式的檢查點時也可減少32.1% 的代價。儲存到 遠端記憶體的時間也只有真實磁碟存取的60% 。
The amount of data being checkpointed determines the checkpoint latency and overhead. Previous studies suggest that using incremental checkpointing, main-memory checkpointing or copy-on-write checkpointing can reduce checkpoint overhead. All these techniques successfully improve checkpointing performance. However, in modern computing system, memory pages of the process could possibly be swapped out before checkpointing. Simply ignoring memory paging may increase the resulting number of disk accesses. In this thesis, we propose continuous checkpointing approach which combines checkpoint facility with virtual memory paging operations. Checkpointing becomes a continuous activity during the execution of a process. The experimental results show that about 80% of disk accesses can be reduced when solving the All-Pair Shortest Paths problem. In this thesis, we also propose the idea of using remote memory as a virtual disk for the checkpoint archives, which not only avoids disk access, but also improves the utilization of computing resources in the network. Remote memory is a collection of memory resources located on other machines. In remote memory checkpointing, many autonomous machines submit jobs to the compute server and while idle, offer their physical memory to the server for checkpointing. From the experimental results of eight checkpointed applications, remote memory reduces 34.5% of the overhead for sequential checkpointing and 32.1% for incremental checkpointing. The checkpoint latency of the remote memory is also about 60% of a local disk checkpoint latency.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870394078
http://hdl.handle.net/11536/64221
顯示於類別:畢業論文