標題: | Prolog語言AND平行執行模式在多引線超純量系統下之研究 A Study of Prolog AND Parallel Execution Model on Multi- Threaded Superscalar System |
作者: | 馮宗錚 Chung-cheng Feng 鍾崇斌 Chung-ping Chung 資訊科學與工程研究所 |
關鍵字: | Prolog;多引線超純量處理機;Prolog;Multi-Threaded Superscalar Processor |
公開日期: | 1994 |
摘要: | 本論文提出一個多引線超純量系統的Prolog處理機架構。此架構的特色是 結合了Prolog中的AND平行執行模式,利用多引線的特性來開發 Prolog中 AND部份的coarse-grained平行度,將每個task交由不同的引線來處理; 而每條引線以超純量處理機的特性來開發程式中fine-grain ed的平行度 。由於動態檢查AND平行性所帶來的額外負擔太大,使得系統效能低於靜 態檢查執行時的效能;另一方面動態檢查所花的硬體代價亦過高,因此本 系統架構是在編譯期間分析程式的AND平行度。另外我們設計了一套AND引 線在動態執行時的機構,此機構使得父引線在執行完fork以後,硬體引線 得以繼續執行有用的引線,使得系統效能提升。父引線和子引線的動作亦 完整地定義在機構當中,其正確性可由模擬結果得到驗證。最後,我們提 出一理想狀態的多引線超純量Prolog處理機,並設計其中用來支援動態執 行AND引線機構的記憶體以及暫存器的配置。在模擬結果方面,我們所採 用的測試程式部份為Berkeley PLM中具有AND平行度之程式,部份是我們 所自己加入的程式,從模擬結果當中,得到 1.37倍於超純量系統的效能 增益,因為我們所研究的是independent AND平行度,其平行性即被強烈 的資料相依所限制,所以其效能增益並不高。 In this thesis, we propose an architecture for multi-threaded superscalar Prolog processor system. The char- acter of this architecture is the incorporation of AND para- llel execution model in Prolog. It exploits the feature of multi-threading with the coarse-grained parallelism in the Prolog AND parallelism, and deliver each task to a different thread; each thread exploits the fine-grained parallelism in an AND thread with the feature of superscalar processor. Becuase of the overwhelming time and space overheads for dynamic checking to find AND parallelism, this architecture exploits AND parallelism in the compiler time. Also we design a mechanism to enhance the performance of AND threads execu- tion in run time. This mechanism can execute a useful thread after the execution of a fork instruction in a parent thread in order to improve the efficiency. The actions of parent and children threads are also well-defined in this mechanism, and the correctness can be verified in the simulation result. Lastly, we propose an ideal multi-threaded superscalar Prolog processor, and design the memory and register sets to support the execution of AND threads. In the simulation, we adopt the Berkeley PLM benchmarks and some programs we devise. From the simulation results, we observe that the performance of this system is 1.27 times that of a superscalar system. Becuase the research we devote is on the independent AND para- llelism which the parallelism is restricted by the strong data dependence, the performance speedup is not high. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT830392040 http://hdl.handle.net/11536/58962 |
Appears in Collections: | Thesis |