標題: 概化聯併基底LR剖析系統及其在中文分析上的應用
A Generalized Unification-based LR Parsing System and its Application to the Analysis of Chinese Sentences
作者: 戴建誠
Jian-Cheng Dai
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 剖析系統;語法規範;中文分析;Parsing System; Grammar Formalism; Chinese Analysis
公開日期: 1993
摘要: 本論文的目的是在發展一個有效率之聯併基底剖析系統,並將此剖析系統 應用在中文句子的分析上。所使用的語法規範結合了法則導引式的“概化 詞組結構語法” (Generalized Phrase Structure Grammar)及詞彙導向 式之“中心語驅動的詞組結構語法” (Head-driven Phrase Structure Grammar),根據此語法規範,我們可以容易地加入各種語法及語意限制。 我們所設計的剖析系統則是一個概化聯併基底LR剖析系統。我們修改了LR 剖析表建構過程中的 closure 及 goto 這兩個函數以處理由特徵結構所 組成的詞組結構律,在整個剖析過程中更應用了“剖析狀態結合”、“局 部不確定合併”及“一致性檢查”等各種技巧使系統更有效率。基於此語 法規範及剖析技巧,我們分析了中文裡長距相依結構 (Long-distance Dependency Constructions),名物化結構 (Nominalizations)及並列結 構 (Coordinate Constructions)。處理長距相依結構的理論基礎主要是 來自概化詞組結構語法,除了此語法理論外我們並使用一組“孤島限制 ”(Island Constraints)來控制斜線特徵 (Slash Features) 的產生。據 此我們提出了一個“產生斜線運算元”,並將此運算元用在剖析表的建構 過程,藉此控制詞組結構律中缺項的產生,以預測可能包含缺項成分的詞 組結構。名物化結構的特徵值是利用“結構分享”方式得到的。為了處理 並列結構,我們將“中心語特徵原則” (Head Feature Convention) 擴 充,以定義並列結構的特徵值。最後,我們設計了一個新的機率式概化LR 剖析系統,配合標記訊息,從大量的不確定結構中挑選一個最佳的語法結 構。一般來說,在一個中文句子裡,一個詞可能具有多種不同的詞性,所 以如果可以用到詞性標記的訊息,將可使剖析系統避免剖析一些較不可靠 的詞序。首先,我們將N個最好的詞序建構成一個多階段的輸入圖形以避 免剖析不同詞序之間重覆的部分。在剖析過程中,所有的剖析路徑形成一 個“句型轉移圖形” (Sentential Transition Graph),並在此圖形中應 用了“句型合併” (Sentential Merging) 及“名次順序”(Rank Ordering) 的搜尋技巧,儘快地找出較好的剖析結果。實驗結果顯示,用 登山式搜尋技巧的效果優於“先深度”搜尋技巧的效果,同時,“句型合 併”技巧對提昇系統效率亦有所幫助。 The thesis is to develop an unification-based parsing system for analyzing Chinese sentences. The grammar formalism is based on the rule-oriented Generalized GPSG and lexicon-oriented Head- driven HPSG. The parsing system is a generalized unification -based LR parser.In the process of constructing parsing table, the closure and the goto fuction are modified to handle phtase structure rules consisting of feature structures. The efficiency of the parser is improved by using various techniques,including state merging,local ambiguity packing, and agreement checking. Using grammar formalism and parsing sustem, we investigate various Chinese constructions, including long- distance dependency constructions, nominalizations ,and coordinations.The theoretical basis for processing long- distance dependency constructions is provided by GPSG.In addition to the linguistic principles of GPSG , we apply a set of island constraints in the proposed slash instantiation operator to instantiate slash features on phrase structure rules and to predict structure of slashed constituents. The feature values of nominalizations are obtained by a structure sharing mechanism. For processing coordinations, the Head Feature Convention is extended to define head features of constituents that contain two head daughters. Finally, a new corpus-based probabilistic generalized LR parser is designed to select the most promising syntactic structure from ambiguous structures. To redice redundant parsing procedures for a tagged sequence, a multi-stage input graph is constucted to represent N-best tagged sequences before parsing. In proposed parser , parse paths are formulated as a sentential transition graph. The parser applies a sentential merging technique and rank ordering stategy to truncate unpromising partial parses. Experimental results show the rank ordering strategy is more efficient than the depth-first strategy and that the proposed sentential merging technique is also effective.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT820392078
http://hdl.handle.net/11536/57886
顯示於類別:畢業論文