标题: 后次微米时代新兴电子设计自动化技术之研究---子计画四:应用计算智慧推理处理后深次微米时代电路设计上的可靠度挑战(I)
Coping with Reliability Challenges to Circuit Designs beyond Deep Sub-Micron Era by Computational Intelligence Reasoning(I)
作者: 温宏斌
Wen Hung-Pin
国立交通大学电信工程学系(所)
关键字: 软性错误(soft error);瞬时性错误(transient fault);间歇性错误(intermittentfault);软体基础的测试技术(software-based testing);计算智慧(computationalintelligence);知识发掘与资料采矿(knowledge discovery and data-mining);Soft error;transient fault;intermittent fault;software-based testing;computational intelligence;knowledge discovery and data-mining
公开日期: 2008
摘要: 进入后深次微米的时代,要从不可靠的元件(reliable components)制造出稳健的
系统(robust systems)带來了每个设计环节上新的挑战。更细微的物理现象如制程的
变異(process variation)和环境中的輻射放射(environmental radiation)等因素对半导
体元件交织成更复杂且更难突破的冲击。运算错误已经不单是显而易見的永久性错
误(permanent fault)。在特定的边际条件、周期性或暂时性的情况下,间歇性
(intermittent)和瞬时性(transient)的错误越來越常发生。一般而言,后者那种不会对
设计本身造成永久损害的错误被称之为软性错误(soft Error)。然而,由于近年來在
先进的电路设计上软性错误被发现的频率越來越高,商业电子产品的可靠度又重新
成为一个重要的研究主题。
过去在软性错误的学术研究上,主要着重在元件层和邏辑层。大部分的研究
主要在强调瞬时脉冲传播中邏辑遮蔽(logic masking)、电子遮蔽(electrical masking)
和时间遮蔽(latching-window masking)的模型建立与模拟。另一主轴则是提出抗輻
射技术以抵抗或减缓软性错误所产生的单事件效应 (single event effect)。为了更了
解软性错误对现实所造成的影响,我们会就邏辑层和架构层上对于不同特性的电路
进行弱点 (vulnerability)分析。一般而言,一部分的设计特性会明显地在规格书中
描述,但是另外一些设计特性则是隐藏在指令级架构 (ISA)裡不易被发觉。然而,
对现今的超大型积体电路设计來說,正规方法(formal method)都具有局限性,只能
适用于小型的设计上。相对的,软体基础的测试技术(Software-based testing)对于软
性错误的分析不啻为一个好的方法。因为以软体为基础的测试技术搭配计算智慧学
习后可以反映出邏辑层及架构层上的条件限制,并且提供一个机率的观点來分析软
性错误所造成的影响。
不同于正规方法,计算智慧的进步,像是支持向量机(support vector machine)
和随机森林(random forest),避免了搜寻过程陷在局部区域的困境,且充分地利用
隐藏于资料集中的统计讯息 (statistical information)。如此一來,我们更容易理解内
嵌的系号线以及其上的软性错误有无可能从架构层或应用层影响到。瞭解了电路设
计上弱点 (weak spots)或架构中高敏感的(susceptible)设计特性后,我们更进一步探
讨容错设计 (design for fault tolerance),如编码技术或硬体复制 (hardware
duplication) 的应用以加强稳健性。
本计画的初步在于发展结合计算智慧技术与以软体为基础的测试技术的软体
套件。目的是希望能从模拟测试平台(testbenches)后得到的资料中自动分析出邏辑
软性错误敏感度。电路的弱点以及架构上高敏感的设计特性会被进一步撷取出來。
本计画远程目标在于提供设计工程师一些容错设计的建议,藉此强化原先电路设计
中的弱点或者架构中需要修正的特性以期达成相容的功能性。功率及效能因素也将
会在这个阶段一并被考量以其达成系统稳健性的最佳化。
Beyond deep sub micron era, manufacturing a robust system from unreliable
components is becoming a new challenge to various levels of design cycles. Subtle
physical phenomena including process variation and environmental radiation intertwine
together to create more sophisticated but substantial impact on semiconductor devices.
Operational errors not only manifest themselves as permanent faults which cannot be
changed irreversibly, but also occur as intermittent and transient faults more frequently
under certain marginal, periodic or temporary scenarios. Typically the latter type of
faults not associated with permanent damage to the device is termed soft errors.
However, due to the increasing soft error rate which is observed in more and more
leading-edge circuit designs, the reliability issue of commercial electronics is revolving
to be an important research topic in recent years.
Previous research along soft error mainly focuses on the device or logic level. The
greater part of studies[1-10] has addressed modeling and simulation of logical, electrical
and latching-window maskings to propagate transient pulses whereas many radiation
hardening techniques[11-15] have been proposed for memory cells and latches to resist or
mitigate the single event effect (SEE) induced by soft errors. To better understanding
the real impact of soft errors to the real world, we will further investigate the
vulnerability of circuits with different features at both logic and architectural levels. In
general, some design features are explicitly described in the specification wordbook
while some others are only implicitly incorporated in ISA and are difficult to be
explored. However, for modern VLSI designs, formal methods are limited and can only
applied to small designs. Software-based testing approach, therefore, will be a good
candidate to reflect constraints from both logic and architectural level and facilitate the
analysis of soft error impact from a probabilistic viewpoint.
Unlike formal methods, computational intelligence advancement such as support
vector machine and random forest prevents the search process from being trapped
locally and fully utilizes the statistical information hidden in the dataset. Therefore, it’s
more likely to understand how one embedded signal as well as the corresponding soft
error will be influenced from the architectural or application level. After knowing the
weak spots in circuitry or susceptible design features in architectures, designs for fault
tolerance such as encoding techniques or hardware duplication will be further explored
to enhance the robustness of the original designs.
The first step of this project is to develop a software package that employs a set of
computational intelligence techniques accompanied by a software-based testing
methodology to automate the analysis of logic soft error susceptibility based on
simulation data from testbenches. Weak spots in circuitry or susceptible design features
in the architecture can be further identified. The ultimate goal of our project is to
provide suggestions of design for fault tolerance, and accordingly to harden weak spots
in circuitry against soft errors or to modify architectural features alternately to achieve
the compatible functionality. Power and performance factors will also need to be taken
into consideration at the final stage during optimizing the robustness of systems.
官方说明文件#: NSC97-2220-E009-035
URI: http://hdl.handle.net/11536/102843
https://www.grb.gov.tw/search/planDetail?id=1687564&docId=290973
显示于类别:Research Plans