標題: | 歧義現象的多層次分析架構-由中文動詞出發 A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs |
作者: | 徐雅苓 Yaling Hsu 劉美君 Me-Chun Liu 外國語文學系外國文學與語言學碩士班 |
關鍵字: | 多義詞;多義現象;框架理論;區辨模組;disambiguation;multiple senses;polysemy;frame semantics;construction;context |
公開日期: | 2005 |
摘要: | 本篇論文的研究重點,是提出一個以語料庫語言學為基礎的多層次架構,來探究多義詞的多義現象,進而建立一套區辨語意的自動標注系統。透過不同語言學理論,例如框架語意學 (參見Fillmore 和 Atkins 1992), 構式語法 (參見 Goldberg 1996) 以及話語分析 (參見 Hopper 和 Thompson 1980),以期提供一個以語言學為出發點的語義檢索機制。多義詞作為詞彙的本質之一,不失為了解句法、語意、及語用三者互動關係的一個關鍵。雖然前人已提供許多不同的研究方向來探究多義詞的多義性,包括類別特徵分析方法、原型理論、框架理論、以及關係理論等,但是仍缺乏一個有系統且具可行性的方法。近來的研究如 Liu 和 Wu (2004),他們提出以語意框架的觀點為基礎來檢視多義性,他們認為語詞的多義性就如Fillmore和 Atkins (1992) 所定義的一樣,是被定義在不同的框架之下。借重不同的框架成份以及其不同的語法表現,Liu 和 Wu (2004)依循著「一個語意,一個框架」的假設,使我們看到了,語意的不同可以歸結於動詞所屬的不同框架概念。然而,這樣的語意界定方法,似乎沒有辦法區辨一個多義動詞的不同語意,當他們屬於不同框架概念,卻有相同框架成分及語法表現的時候。以中文動作動詞「拿」為例,其中兩個語意就帶有相同的框架成分及語法表現,如例子 (1),
(1) Agent>V>Theme:
a. …病人[Agent] 拿 著健保卡[Theme]上門… (語意 1 ‘持’)
b.…我[Agent]可不可以順道 拿 個研究學位[Theme]?... (語意 2 ‘得/取’)
因此,由例子 (1) 我們可以預測,區辨多義詞只靠框架理論是不足夠的。當框架成分無法提供足夠的資訊來決定語意時,還有什麼是我們沒有考慮到的部分呢?本文中所要提出的架構,則將兩個重要可變因素考慮進來:配搭組合和語境依存。本文主要目的在於提出一個多層次的分析架構,來定義多義詞在不同語法表現中的適當語意。這個多層次的分析方法,依據以下三個步驟可以作為一個語義區辨的模組:(1) 以框架為依據的區辨方法 (2) 以配搭組合為依據的區辨方法 (3) 以語境依存為依據的區辨方法。
本文研究主要來自中研院漢語平衡語料庫的自然語料。在文中的個案研究皆為高頻詞,但每個個案只採200筆語料作細部標記。使用語料庫的語料,主要是因為語料庫的語料,提供了重要的語法語意分布趨向,這是母語說話者的直覺沒有辦法察覺到的。
首先,依據FrameNet的理論,在我們區辨模組的第一步驟,是把一個語料庫中的多義詞依據其不同的語意框架概念,而定義為不同的語意;其主要的區分方式,則是依據不同的框架成分及其主要語法表現,來區分成不同的語意組。當第一個步驟無法成功區辨語意,也就是當碰到不同語意卻帶有相同框架成分及其相同的語法表現時,我們則需進入模組中的第二個區辨步驟—配搭組合。在這個步驟中,我們所須注意的是那些和非核心論元的搭配詞組;這些非核心論元的搭配詞組依據不同的詞類可再作分類,如副詞、形容詞、時態標記等。進而我們將會發現,多義詞的不同語義,和這些非核心論元會有不同的固定搭配關係。然而,當搭配組合的方法也無法提供更進一步的資訊時,我們則需要進到第三步驟—語境依存;在這個步驟中,我們將搜尋在跨語句的語境當中,是否有和多義詞不同語意相關的詞語。多義詞和不同語義的連結,主要是建立在它們之間語義或語用上的相關;在SUMO中,我們確實是可以搜尋到它們之間的連結。我們將以四個中文單詞動詞為例—走、拿、聽、看,以論證本文所提出的模組。
在本文中,藉由所提出的機制,除了重新定義多義性之外,也成功的提供電腦區辨系統,一個以語言學為基礎的有效的語義區辨模組。 Abstract This study explores how multiple senses of polysemous words could be distinguished. It proposes a hybrid and corpus-based linguistic model and specifies the procedures to build an automatic tagger for sense disambiguation based on Mandarin verbs. It seeks to provide a linguistically motivated solution for detecting meaning with the aid of linguistic theories such as Frame Semantics (Fillmore and Atkins 1992 ), Construction Grammar (Goldberg 1996) and discourse analysis (Hopper and Thompson 1980). Being an essential property of the lexicon, polysemy is the key to understanding the interplay between syntax, semantics and pragmatics. Although polysemy has been investigated in a number of approaches, including classical feature analysis, prototype theory, frame-based approach, relational approach, and so on, a systematic and applicable solution is still lacking. Recently, working on Mandarin lexical semantics, Liu and Wu (2004) proposed a frame-based perspective in viewing polysemy as belong to different ‘frames’, which is defined by Fillmore and Atkins (1992). Making use of the distinctions in frame elements and their grammatical realizations, Liu and Wu (2004) is able to show that semantic differences may be attributed to different semantic frames the verb belongs to, following ‘the one sense, one frame’ hypothesis. However, there are cases where two separate meanings of the same verb may show exactly the same surface patterns with the same sets of frame elements. For example, in the case of the motion verb NA拿, two separate senses may end up with the same number and pattern of frame elements, as shown in (1): (1) Agent < V <Theme: a. …病人[Agent] 拿 著 健保卡[Theme] 上門… (sense 1 ‘carrying’) bing ren na zhe bao jian ka shang men patien take ZHE health insurance card up door ‘The patient carried the health insurance card to the counter.’ b.…我[Agent]可不可以順道 拿 個 研究學位[Theme]?(sense 2 ‘getting’) wo ke bu ke yi shun dao na ge yan jiu xue wei I can not can by the way take CL research academic degree ‘By the way, can I get an academic research degree?’ Therefore, it is apparent that a purely frame-based approach may be insufficient in dealing with polysemes. When frame elements fail to provide determining clues, what else should be taken into consideration? The model proposed in this study calls for consideration of two other variables: colloconstructions and contextual dependencies. This study aims to propose a hybrid multi-module solution to identify the most appropriate lexical sense in various expressions of a polyseme. The hybrid approach can be viewed as a sense disambiguating model based on three steps: 1) frame-based distinction, 2) colloconstruction distinction, and 3) contextual dependence distinction. The study is based on naturally occurring data extracted from the Sinica Balanced Corpus, which is established by the CKIP (Chinese Knowledge and Information Processing) group at Academia Sinica and open to the public at the Internet site: http://www.sinica.edu.tw/ftms-bin/kiwi.sh/. Given the high frequency of occurrences of the target words, only 200 entries are examined closely for the discussion. Corpus data provide explicit and implicit distributional tendencies which may go beyond native speaker’s intuition. Using corpus data as the input, the first step of the proposed model is to identify the senses of a polysemous word corresponding to the distinctions in semantic frames, following FrameNet. The extracted data from Sinica Corpus can be roughly classified into several frames by their basic patterns of expressing the core frame elements (arguments). When distinctions of frame elements and their basic patterns fail, senses are further identified by the second module - Colloconstrucion. In this step, attention is paid to the collocational patterns of non-core arguments. These non-core arguments can be classified into various syntactic categories, such as adverbials, adjectives, aspectual markers, and so forth. And frequent collocates, be it grammatical or lexical, will be identified with each individual sense. However, when colloconstruction fails to indicate any decisive cues, the third module - contextual information is called upon. In this module, the relevant contextual elements are thoroughly searched to establish a relational link within or cross clausal boundaries. The relational link may be established by any semantic/pragmatic associations between the polyseme and the contextual element that a larger semantic taxonomy, such as SUMO synsets (translated in BOW). To demonstrate the model, four sets of verbs (zou 走, na 拿, ting 聽, kan看) will be used as illustrations. By redefining polysemy with operational mechanisms, this study successfully provides a linguistic model with theoretical validity to develop a computational system for sense disambiguation. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009245518 http://hdl.handle.net/11536/77410 |
顯示於類別: | 畢業論文 |