標題: 在一般性網路中一致性問題之研究
A Study on the Agreement Problems in a General Network
作者: 蕭顯勝
Siu, Hin Sing
金陽和, 楊維邦
Yeo-Hao Chin, Wei-Pang Yang
資訊科學與工程研究所
關鍵字: 協議問題;拜占庭協議;一致問題;一般性網路;混合錯誤模式;Agreement problem;Byzantine agreement;consensus problem;general network;hybrid fault model
公開日期: 1997
摘要: 在計設容錯分散式系統時,如何讓功能良好之處理機獲得一個共同協 議是一個十分重要之問題。協議問題主要之精神為是讓一群處理機在容錯 之作業環境下分亨共同資訊;也就是說,功能良好之處理機所獲得之協議 是不受系統中錯誤單元(處理機和通訊線)之行為影嚮的。協議獲得的主要 目的是為了維護系統的效能與整合性。協議問題之重要性可以從它在分散 式系統之地位觀察到□悃M協議問題之通訊協定是其他分散式問題的核心 ;例如,同步問題、可靠通訊問題、資源分配問題、系統重組問題、重複 檔案系統問題、和探測器資料讀取問題等。 在實際的分散式系統中, 處理機和通訊線路是會有多種錯誤模式同時發生之情形(一般稱為混合錯 誤模式)、網路之拓樸可能是非完全連結之架構、而且功能正常之處理機 在一般情況下是不知道那一個系統單元是有錯誤情況發生的。我們稱這種 網路模式為一般性網路。然而,現有的通訊協定沒有一個能在一般性網路 解決協議問題。 本論文之研究目的是在一般性網路提出一些通訊協定 來解決協議問題。首先,我們設計了一個名叫GPBA之通訊協定在一般性網 路中解決拜占庭協議問題(Byzantine agreement problem)。而且能利用 這個通訊協定來解決相互作用的一貫性問題(interactive consistency problem)和一致性問題(consensus problem)。其次,一個通訊協定GPSC 被提出用來在一般性網路中解決強一致性問題(strong consensus problem)。這個問題是傳統一致性問題之延伸,在這問題中要求處理機所 獲得之最後協議必定是某一功能正常處理機之初值。 最後,錯誤診斷 協議問題(fault diagnosis agreement problem)在一般性網路中被考慮 ,這個問題之目的是要使每一個功能正常之處理機能檢查及找出一個錯誤 處理機之共同集合。一個以證據為基底之錯誤診斷通訊協定被提出來解決 這個問題。被提出之通訊協定首先收集一個拜占庭協議問題中之交換訊息 作為證據,然後從收集之證據來檢查及找出那些處理機發生了錯誤。接著 把被檢查及找出的錯誤處理機和它們連接之通訊線從網路中移除,使得網 路能被重組。 我們亦証明了被提出之通訊協定是最佳的。在一般性網 路中,它們使用了最少之訊息交換量及能容許最大量之錯誤單元數目來解 決上述之問題。 Reaching agreement among the fault-free processors in the presence of faultsis one of the most important problems in designing a fault-tolerant distributed system. The simple idea of agreement problem is to share information amonga group of processors, preferably in a fault-tolerant manner. That means eachfault-free processor should be able to consistently agree on and produce correct results despite the actions of the faulty components (communication links,processors, or both). The goal of agreement is to maintain the performance and integrity of the system. The important of the problem stems from its roles in a distributed system. This problem is at the core of protocols handling the other problems in a distributed system. The examplesare synchronization, reliable communication, resource allocation, system reconfiguration, replicated file systems, sensor reading. In a real-life distributed system, either processors, communication links,or both can be subjected to different types of failures simultaneously (also called hybrid fault model, or mixed failure types), the network topology may not be fully connected, and a fault-free processor does not know which component in the network is faulty. We call such a network model a general network. However, none of the existing protocols is designed for solving theagreement problems in a general network. The goal of this research is to propose some protocols for solving the agreement problems in a general network. First, we designed a protocol, called GPBA (the generalized protocol for the BA problem) to solve the Byzantine agreement (BA) problem in a general network. GPBA can be used to solve the other two agreement problems: the interactive consistency and consensus problems. Second, a protocol, called GPSC (the generalized protocol for the strong consensus problem), is proposedto solve the Strong Consensus (SC) problem in ageneral network. The SC problem is a variant of the BA problem. It requiresthat the agreed value among fault-free processors be one of the fault-freeprocessor's initial values. Finally, the Fault Diagnosis Agreement (FDA) problem is considered in a general network. The goal of the FDA problem is to make each fault-free processor detect/locate a common set of faulty processors. An evidence-based fault diagnosis protocol is proposed to solve the FDA problem. The proposed protocol first collects the messages which have accumulated in the BA protocolas the evidence. By examining the collected evidence, a fault-free processor can detect/locate which processor is faulty. Then, the network can be reconfigured by removing the detected faulty processors and the links connected to these processors from the network. Using minimum number of message exchanges, the proposed protocols for the BA, SC, and FDA problems can tolerate the maximum number of faulty components (processors and links) in a general network.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860394002
http://hdl.handle.net/11536/62827
顯示於類別:畢業論文