標題: 高速度和低複雜度公匙密碼系統之律動架構設計
Designs of Systolic Architectures for High-speed and Low-complexity Public-key Cryptosystems
作者: 蔡維昌
Wei-Chang Tsai
王聖智
項春申
Sheng-Jyh Wang
C. Bernard Shung
電子研究所
關鍵字: 密碼系統;公匙;律動架構;Montgomery's 演算法;有限場;cryptosystem;public key;systolic architecture;Montgomery's algorithm;finite field
公開日期: 2000
摘要: 本論文探究公匙密碼系統的優點和目前需改進的問題,雖然採用公匙密碼系統可以避免一些傳統密碼系統的特有問題,但是,公匙密碼系統的運算需要較長的位元來保持整個系統的安全性,進而使得公匙密碼系統的加解密速度較慢。因為律動陣列(systolic array)的管線性(pipelining)、同質性(homogeneity)、局限性(localization)等特性,而使得律動陣列十分適合於實現此類系統的運算。管線性加快硬體的速度,同質性加速硬體實現及偵錯的速度,而局限性讓重要的信號,傳遞到附近的區域,減少繞線的所導致的時間延遲與加速運算速度。 本論文中發展了兩個律動陣列的架構實現方法: 分割法(Splitting method)與合併法(merging method)。分割法是將律動陣列分割加上預先計算的方法,再平行化所分割的部份運算,來使工作頻率提升而加速原來的律動陣列。合併法改良了原有於律動陣列所實現的公匙密碼系統之主要運算中,隱含的運算間斷問題(one-clock-cycle-gap problem),利用合併法,將已切割的架構以一種特別的方式合併,加上最佳化而改善了使用度,並因合併之故,減少了很多硬體,降低了複雜度。這兩個架構被驗證並應用於RSA 公匙密碼系統(RSA Public-key Cryptosystem)和橢圓曲線公匙密碼系統(Elliptic Curve Cryptosystem),當實現RSA 公匙密碼系統時,利用 Montgomery 的演算法,可以實現出較快速的模乘法,應用到我們改良的律動陣列上,可得到較好的設計。當實現橢圓曲線公匙密碼系統時,主要的運算有限場的乘法,我們兩個新的律動架構也被應用到特性為2的有限場中(GF( ))的乘法器,也得到相當好的結果。此兩個律動架構於這兩個公匙密碼系統中,速度上,第一個架構表現出色,而第二個架構,有較低的面積和較低的時間面積複雜度。另外,綜合這兩個應用,和根據這兩個的律動架構的概念,發展出一般化的方法,可以運用到一個具反覆運算的演算法,將之實現成快速、低面積和低複雜度的律動架構,此兩個一般化方法,最後以一個高基數的Montgomery演算法來做為這兩個方法的例子,以證明此兩個方法的可行性。此外,本論文也另外提出一個架構於 Verilog 上的電路閘層次的功率估算軟體,利用行徑功率(path power) 和突波功率的估算,快速且較準確的估算出整個電路所消耗的功率,由於在公匙密碼系統中的位元數都很長,此方法適合於公匙密碼系統設計的功率估算。
This dissertation explores the benefits and problems in public-key cryptosystems. Although the adoption of public-key cryptosystem can avoid some typical problems in conventional cryptosystems, the major problem of current public key cryptosystems is their significant computation complexity, which often limits the throughput rate of cryptographic computation. To implement the main operations of cryptosystems, a systolic array could be very suitable due to its inherent properties of pipelining, homogeneity, and localization. The pipelined architecture can be utilized to improve the throughput rate by shortening the clock period and the total processing time; The homogeneity property of systolic array may help the shortening of the design and testing period in VLSI implementation; and the property of spatial and temporal localization may be used to localize data transactions and control flow for smaller routing area and higher computation speed. In this thesis, We proposed two approaches to design the systolic architectures: splitting method and merging method. In the partitioning method, the number of pipeline stages is increased and the clock cycle period is shortened. By adopting pre-computation, together with the parallelism of these pipelines, the computation speed of the partitioned structure can be improved. The merging method, on the other hand, may avoid the one-clock-cycle-gap problem and decrease the area size, power consumption, and complexity. This merged architecture can be further optimized for computation speed. These two proposed methods are adopted in the implementation of a Montgomery-based modular multiplier for RSA public-key cryptosystems and a finite-field multiplier for elliptic curve cryptosystems. The comparisons show that our partitioning method offers higher throughput rate while our merging method offers lower area size, lower power consumption, and lower complexity. Furthermore, for a farther stage of development, we generalize our two systolic architectures to handle more general operations. Among these designs which could be suitable for our generalized architectures, we mention a high-radix RSA public-key cryptosystem as an example. Moreover, due to the fact that the key length of a public key cryptosystem is usually long for security reason, it becomes fairly difficult to quickly estimate the actual power consumption of these long-bit systolic arrays. Hence, in this thesis, we also propose a new power estimator to handle the power estimation of long-bit systolic architectures.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890428001
http://hdl.handle.net/11536/67070
顯示於類別:畢業論文