標題: | 後次微米時代新興電子設計自動化技術之研究---子計畫四:應用計算智慧推理處理後深次微米時代電路設計上的可靠度挑戰(I) Coping with Reliability Challenges to Circuit Designs beyond Deep Sub-Micron Era by Computational Intelligence Reasoning(I) |
作者: | 溫宏斌 Wen Hung-Pin 國立交通大學電信工程學系(所) |
關鍵字: | 軟性錯誤(soft error);瞬時性錯誤(transient fault);間歇性錯誤(intermittentfault);軟體基礎的測試技術(software-based testing);計算智慧(computationalintelligence);知識發掘與資料採礦(knowledge discovery and data-mining);Soft error;transient fault;intermittent fault;software-based testing;computational intelligence;knowledge discovery and data-mining |
公開日期: | 2008 |
摘要: | 進入後深次微米的時代,要從不可靠的元件(reliable components)製造出穩健的
系統(robust systems)帶來了每個設計環節上新的挑戰。更細微的物理現象如製程的
變異(process variation)和環境中的輻射放射(environmental radiation)等因素對半導
體元件交織成更複雜且更難突破的衝擊。運算錯誤已經不單是顯而易見的永久性錯
誤(permanent fault)。在特定的邊際條件、周期性或暫時性的情況下,間歇性
(intermittent)和瞬時性(transient)的錯誤越來越常發生。一般而言,後者那種不會對
設計本身造成永久損害的錯誤被稱之為軟性錯誤(soft Error)。然而,由於近年來在
先進的電路設計上軟性錯誤被發現的頻率越來越高,商業電子產品的可靠度又重新
成為一個重要的研究主題。
過去在軟性錯誤的學術研究上,主要著重在元件層和邏輯層。大部分的研究
主要在強調瞬時脈衝傳播中邏輯遮蔽(logic masking)、電子遮蔽(electrical masking)
和時間遮蔽(latching-window masking)的模型建立與模擬。另一主軸則是提出抗輻
射技術以抵抗或減緩軟性錯誤所產生的單事件效應 (single event effect)。為了更了
解軟性錯誤對現實所造成的影響,我們會就邏輯層和架構層上對於不同特性的電路
進行弱點 (vulnerability)分析。一般而言,一部分的設計特性會明顯地在規格書中
描述,但是另外一些設計特性則是隱藏在指令級架構 (ISA)裡不易被發覺。然而,
對現今的超大型積體電路設計來說,正規方法(formal method)都具有侷限性,只能
適用於小型的設計上。相對的,軟體基礎的測試技術(Software-based testing)對於軟
性錯誤的分析不啻為一個好的方法。因為以軟體為基礎的測試技術搭配計算智慧學
習後可以反映出邏輯層及架構層上的條件限制,並且提供一個機率的觀點來分析軟
性錯誤所造成的影響。
不同於正規方法,計算智慧的進步,像是支持向量機(support vector machine)
和隨機森林(random forest),避免了搜尋過程陷在局部區域的困境,且充分地利用
隱藏於資料集中的統計訊息 (statistical information)。如此一來,我們更容易理解內
嵌的系號線以及其上的軟性錯誤有無可能從架構層或應用層影響到。瞭解了電路設
計上弱點 (weak spots)或架構中高敏感的(susceptible)設計特性後,我們更進一步探
討容錯設計 (design for fault tolerance),如編碼技術或硬體複製 (hardware
duplication) 的應用以加強穩健性。
本計畫的初步在於發展結合計算智慧技術與以軟體為基礎的測試技術的軟體
套件。目的是希望能從模擬測試平台(testbenches)後得到的資料中自動分析出邏輯
軟性錯誤敏感度。電路的弱點以及架構上高敏感的設計特性會被進一步擷取出來。
本計畫遠程目標在於提供設計工程師一些容錯設計的建議,藉此強化原先電路設計
中的弱點或者架構中需要修正的特性以期達成相容的功能性。功率及效能因素也將
會在這個階段一併被考量以其達成系統穩健性的最佳化。 Beyond deep sub micron era, manufacturing a robust system from unreliable components is becoming a new challenge to various levels of design cycles. Subtle physical phenomena including process variation and environmental radiation intertwine together to create more sophisticated but substantial impact on semiconductor devices. Operational errors not only manifest themselves as permanent faults which cannot be changed irreversibly, but also occur as intermittent and transient faults more frequently under certain marginal, periodic or temporary scenarios. Typically the latter type of faults not associated with permanent damage to the device is termed soft errors. However, due to the increasing soft error rate which is observed in more and more leading-edge circuit designs, the reliability issue of commercial electronics is revolving to be an important research topic in recent years. Previous research along soft error mainly focuses on the device or logic level. The greater part of studies[1-10] has addressed modeling and simulation of logical, electrical and latching-window maskings to propagate transient pulses whereas many radiation hardening techniques[11-15] have been proposed for memory cells and latches to resist or mitigate the single event effect (SEE) induced by soft errors. To better understanding the real impact of soft errors to the real world, we will further investigate the vulnerability of circuits with different features at both logic and architectural levels. In general, some design features are explicitly described in the specification wordbook while some others are only implicitly incorporated in ISA and are difficult to be explored. However, for modern VLSI designs, formal methods are limited and can only applied to small designs. Software-based testing approach, therefore, will be a good candidate to reflect constraints from both logic and architectural level and facilitate the analysis of soft error impact from a probabilistic viewpoint. Unlike formal methods, computational intelligence advancement such as support vector machine and random forest prevents the search process from being trapped locally and fully utilizes the statistical information hidden in the dataset. Therefore, it’s more likely to understand how one embedded signal as well as the corresponding soft error will be influenced from the architectural or application level. After knowing the weak spots in circuitry or susceptible design features in architectures, designs for fault tolerance such as encoding techniques or hardware duplication will be further explored to enhance the robustness of the original designs. The first step of this project is to develop a software package that employs a set of computational intelligence techniques accompanied by a software-based testing methodology to automate the analysis of logic soft error susceptibility based on simulation data from testbenches. Weak spots in circuitry or susceptible design features in the architecture can be further identified. The ultimate goal of our project is to provide suggestions of design for fault tolerance, and accordingly to harden weak spots in circuitry against soft errors or to modify architectural features alternately to achieve the compatible functionality. Power and performance factors will also need to be taken into consideration at the final stage during optimizing the robustness of systems. |
官方說明文件#: | NSC97-2220-E009-035 |
URI: | http://hdl.handle.net/11536/102843 https://www.grb.gov.tw/search/planDetail?id=1687564&docId=290973 |
顯示於類別: | 研究計畫 |