標題: 一個適用於雙重系統上的高效率前向回復抽點檢驗之容錯策略
An Efficient Forward Recovery Checkpointing Scheme for Fault -Tolerant Duplex Systems
作者: 王建春
Wang Chien-ch'un
王國禎
Kuochen Wang
資訊科學與工程研究所
關鍵字: 前向回復;抽點檢驗策略;雙重系統;Forward recovery, checkpointing scheme, duplex system
公開日期: 1994
摘要: 在本論文中,我們提出一個適用於雙模組系統的高效率前向回復抽點檢驗 策略,它可以回復暫時性錯誤且不需要利用到任何備用模組。我們的方法 將與下列兩種方法比較:(1)前向回復時需要用一個備用模組及(2)兩個模 組都需要重新執行。此兩種方法分別有如下的缺失:需要備用模組來執行 重試動作及需較長的平均完成時間。在我們提出的抽點檢驗策略中可以改 進以上缺失,它的基本觀念如下:當有錯誤被偵測到時,兩個不一致的檢 查點狀態先被儲存起來,然後選擇其中一個模組來重試,另一模組則繼續 執行。當此重試程序快結束時,這個重試檢查點將與兩個事先儲存的檢查 點比較,以找出正確的檢查點,這時正確的檢查點就是其中的兩個相同檢 查點,而另一個不同的檢查點將被視為有錯誤的。同時,另一模組也已完 成一個前向檢查點,如果在下一個檢查點執行時也發現有錯誤時,則這個 前向檢查點將立刻被用來比較,以判斷另一錯誤所在,如此回復兩個錯誤 僅花費一個重試時段。我們的方法可以處理暫時性及永久性錯誤。最後將 以數學模式來證明我們的方法對回復單一暫時性錯誤是較有效率的。無論 在何種情況下,我們的方法皆比方法(2)好或相當。相對於方法(1),我們 的方法不需要用到備用模組,且在高錯誤率的環境下,我們的方法在平均 完成時間及總執行時間上也會比較短。 In this thesis, we present a cost-effective forward recovery checkpointing scheme (FRCS) to recover transient faults in a duplex system without a spare module. There are two other fault recovery schemes for a duplex system: (1) to retry using a spare module and (2) to rollback using both processing modules. In order to overcome these drawbacks about two schemes: requiring a spare module for the first scheme and long average completion time for the second scheme, we propose a novel forward recovery checkpointing scheme for duplex systems. The basic concept of our FRCS scheme is as follows: When a fault is detected, two non-identical checkpoints are saved. Then one module is selected to retry the current checkpoint and the other module continues execution toward the next checkpoint. At the end of the retry procedure, the retry checkpoint is compared with two previous saved checkpoints for recovering a fault in the current interval and the other checkpoint will be compared with next checkpoints for forward recovery of a possible fault in the next interval, if any. The correct checkpoint is found from the two checkpoints which are identical. The checkpoint that is different from the others is assumed the faulty checkpoint. Both transient faults and permanent faults can be handled in our scheme. Mathematical models have been developed to demonstrate that our scheme is at most better than the RB scheme and is better than the RFCS scheme in terms of the number of modules required, average completion time, and total execution time under high failure rates.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830394036
http://hdl.handle.net/11536/59058
顯示於類別:畢業論文