標題: Prolog OR平行執行模式在多引線超純量處理機下的研究
Prolog OR Parallel Execution Model on Multi-Threaded Superscalar Processor
作者: 許傳政
Chuan-cheng Hsu
鍾崇斌
Chung-ping Chung
資訊科學與工程研究所
關鍵字: Prolog;多引線超純量處理機;Prolog;Multi-Threaded Superscalar Processor
公開日期: 1994
摘要: 本論文針對Prolog語言中的OR平行性提出一個多引線超純量的處理機架構 。這個架構的特性是利用多引限的特性來開發Prolog OR部份的 coarse- grained平行度,將每個OR task交由不同的引線來處理;而每條引線以超 純量處理機的特性來開發task中的fine-grained平行度。在我們的OR平行 執行模式中,先將一個Prolog程式依其執行特性做分割 。一個程式依其 不同部份在執行時的行為的特性,可被分割成許多子程式;這些子程式的 執行特性可分為下列四種:完全不可OR平行執行子程式,可OR平行遞迴子 程式,OR平行平衡子程式以及不規則子程式。整個程式的執行行為,會隨 著執行到的子程式的種類不同而改變。而本論文中所設計的處理機架構, 其目地就是要妥善利用各子程式中的平行度,來加速程式的執行。另外, 我們還提出一動態負載平衡法,以減少不規則子程式所造成的效能限制。 最後,我們提出一個能充份利用Prolog中OR平行性的多引線超純量處理機 架構,並設計其中的記憶體和暫存器組部份,來實現此模式。利用PLM和 BAM系統的測試程式集,從模擬數據中,我們發現具有OR平行性的程式在 此架構下有2.64倍於超純量架構的效能增益。 In this thesis, a multi-threaded superscalar processor architecture for Prolog OR parallel execution is proposed. With multi-threading characteristic of this architecture coarse- grained OR parallelism of a Prolog program can be exploited. Each OR task is to be executed by different thread. Within each thread, superscalar processing exploits fine-grained parallelism of Prolog code. In our OR execution model, Prolog procedures are classified according to their properties. A program can be partitioned into several subprograms according to their behaviors in execution. These subprograms are classified as: non-parallelizable subprograms, parallelizable recursive subprograms, balanced subprograms and irregular unbalanced subprograms. The behaviors of a program will change dependend upon the subprogram currently being executed. The goal of this processor architecture design is to gain program execution speedup by properly utilizing the OR parallelism inherent in the subprograms. Furthermore, in order to reduce the performance restriction caused by irregular unbalanced subprograms, a dynamic load balancing method is proposed. Finally, a multi-threaded superscalar processor architecture for the OR parallelism in Prolog is proposed to implement this model. Specially, the designs of memory and register file are presented. Benchmarks of PLM and BAM systems are used in performance simulation. For the benchmarks with OR parallelism, the performance of this architecture is 164% better than that of superscalar architecture.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830392041
http://hdl.handle.net/11536/58963
Appears in Collections:Thesis