標題: 分散式容錯架構的設計與分析
Design and Analysis of a Fault-Tolerant Architecture through Loosely Coupled Multicomputers
作者: 楊元榮
Yuan-Rung Yang
陳耀宗
Yaw-Chung Chen
資訊科學與工程研究所
關鍵字: 容錯技術;可靠性;存活率;分散式架構;馬可夫模型;智慧型網路;Fault Tolerant tech.;Reliability;Availability; Distri. Architecture;Markov Model;Intelligent Net.
公開日期: 1993
摘要: 隨著對電腦倚賴程度的增加,電腦的可靠性變得相當的重要。然而在系統 設計時,若不考慮容錯技術,單純使用高品質的元件,並不能有效地增加 可靠性;唯有在系統中加入容錯技術才可做到。軟、硬體容錯技術最常用 的方法是利用備份元件來達到容錯的基本目的,即有錯誤發生時仍能繼續 執行服務。目前容錯的硬體技術已相當成熟,但並無標準可言;同時針對 特定應用的容錯系統又價格昂貴。對高存活率的需求,例如智慧型網路的 應用,如何降低容錯系統成本而又符合實際需要,是本論文主要的研究動 機。在本篇論文中將針對高存活率的應用提出一個分散式容錯架構,以同 時達到硬體與軟體的容錯。此分散式容錯架構係運用現有網路及一般電腦 便可達成,符合實用性及低成本的需求。它以兩台由網路連結的電腦為架 構,一台執行服務,另一台待命,而在其上的軟體亦各有備份,當電腦或 軟體中斷,則會啟動相對應的元件繼續執行任務,達到不中斷之目的。論 文中並利用馬可夫模型來評估系統的可靠性及存活率的增加程度與備份軟 體數目的關係。此系統可以應用在智慧型網路上,論文中將以免費服務電 話為例子做說明。 As people increasingly count on computers, the reliability of the computer become more and more important. Using high- quality components cannot effectively increase the reliability of the system. Therefore, to achieve higher reliability, we have to add fault-tolerant technologies to the system in the design. The redundancy components are used in both hardware and software fault-tolerant technologies that meet basic goal of fault tolerance, that is, to perform the intended function actions even in the presence of faults or unexpected conditions. Presently, the hardware fault-tolerant technologies are pretty mature, but there is no standardization, also these products for special purpose are expensive. As for high availability applications, for example, Intelligent Network, how to reduce the system cost while still meet realistic needs is a major concern. In this thesis, we propose a fault-tolerant architecture through loosely coupled multicomputers for building a high availability system, through it both software and hardware fault tolerance can be achieved simultaneously. The system architecture is not only cost saving but also implementation feasible. The configuration includes two computers linked by FDDI network, if the active computer or the application fails, a corresponding standby one can take over the task and continue to provide the services. We use Markov process to model the method and evaluate the reliability and availability of the system. We also discuss the relationship between the improvement of the system and the number of alternates in the application. We use a 080 freephone service as a scenario to demonstrate the system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT820392012
http://hdl.handle.net/11536/57815
Appears in Collections:Thesis