標題: 新世代自動語音辨識技術---第二階段---國語及方言之音節階層事件偵測及其相關研究
Syllable-Level Landmark Detection and Its Applications in Mandarin, Minna and Hakkin
作者: 王逸如
WANG YIH-RU
國立交通大學電信工程學系(所)
關鍵字: 新世代自動語音辨識系統;發音特徵變化點;語音屬性;next-generation automatic speech recognition;speech landmark;speech attribute
公開日期: 2010
摘要: 在新世代自動語音辨識技術中,將結合語音與語言學知識,以多種語音屬性(attribution)與語音事件(event)偵測器群,盡可能從語音信號中擷取各種聲學訊息,以提供後級『語音事件及相關知識整合』及『語音證據確認』單元,做語音辨認甚至於語意瞭解,以期突破傳統隱藏式馬可夫模型方式的困境。新世代自動語音辨識技術或稱之為以偵測器為基礎(detection-based)的架構,不再是直接對整句語音信號做辨認,而是偵測出語音信號中我們感興趣的部分,如:詞、片語或觀念(concept)而已。此時偵測器群不只是像傳統語音辨認架構中之參數抽取所扮演的角色,它能找出語音信號中的時序資訊以及語音特徵,所以新世代自動語音辨識技術中的發音特徵變化點(landmark)之偵測就變成十分的重要了。 在本計畫中將以精確的偵測語音信號中的發音特徵變化點(landmark)為起點,將進行下列研究: (1) 具高解析度音節端點偵測器(syllable-level boundary detector) - 計畫中首先將充分利用語言學家的知識以建立準確至語音信號取樣點的發音特徵變化點偵測器,再結合語音信號在時態變化的結構特性(temporal structure constrains)製作一個可靠的階層式音節及其相關的端點偵測器; (2) 國語整合式語音音節端點與語音屬性偵測器 - 計畫中將提出之一整合式架構緊密結合發音特徵變化點偵測器與語音屬性偵測器,讓兩種偵測器相互使用彼此資訊,以提升各自的偵測效能; (3) 跨方言整合式語音特徵變化點偵測與語音屬性偵測器- 計畫中將進一步製作台灣常用之方言-台語與客家話之整合式語音特徵變化點偵測與語音屬性偵測器,以證實階層式音節及其相關的端點偵測器與語音屬性偵測器是可以跨方言的; (4) 跨方言整合式音節端點與語音屬性偵測器之應用- 將利用跨方言整合式音節端點與語音屬性偵測器,重新探討以前在語音辨認中尚待解決的一些問題-包括(1)利用syllable-level boundary結果改進傳統HMM架構之語音辨認器,(2)利用整合式音節端點與語音屬性偵測器來做發音同化(pronunciation assimilation)現象之探討與口語語音中口吃(shutter)、遲疑(hesitation)、更正(repair)等現象之偵測。 本計畫將提供其它子計畫所需之語音屬性與事件之資訊,以期建立一套新世代自動語音辨識架構;同時所建立之整合式語音音節及其相關的端點偵測器與語音屬性偵測器也將提供我們以工程的觀點去探討語言學上的一些現象。
In the next-generation automatic speech recognition paradigm, two types of speech detectors, i.e., landmark (to find the articulation change points in time) and attribute (to find the manner and place of the articulatory) detectors are the fundamental building blocks to reliably phone, word or phrase detection. Especially, landmark detectors are the most important front-end for the following “event merge” and “evidence verification” stages. In this project, we will focus on developing accurate and reliable landmark detectors and studying the optimal way to integrate them with our well-established attribute detectors (done in previous projects). The following items will be carefully studied and implemented: (1) Syllable-level boundary detector using temporal structure information- High-resolution sample-based landmark detectors will be developed using articulation parameters. Moreover, hierarchical syllable-level boundary detectors will also be implemented to verify the results of the landmark detectors’ using the temporal structure constrains of the speech signal. (2) Integrated boundary and attribute detectors for Mandarin - Integrated detector architectures will be developed to tightly integrate and further improve the performance of the landmark and attribute detectors. (3) Integrated boundary and attribute detectors for Taiwanese and Hakka - Taiwanese and Hakka were the most frequently used dialects in Taiwan. In this project, the cross-dialect capacities of the integrated syllable-level boundary and attribute detectors will be cross-examined using these two dialects. (4) Applications of integrated boundary and attribute detectors - Several interesting topics will be studied using the integrated boundary and attribute detectors including (a) applying the syllable boundary information to improve the performance of traditional HMM recognition scheme, (b) pronunciation assimilation phenomena and their relationship with speaking rate, and (c) detection of shutter, hesitation and repair in spontaneous speech. In brief summary, the cross-dialect integrated syllable-level boundary and attribute detector proposed in this sub-project will provide other sub-projects the necessary components to successfully build the next-generation automatic speech recognition paradigm. Moreover, the proposed integrated boundary and attribute detectors will be cross-examined with linguistic knowledge.
官方說明文件#: NSC97-2221-E009-080-MY3
URI: http://hdl.handle.net/11536/100454
https://www.grb.gov.tw/search/planDetail?id=1991832&docId=325756
顯示於類別:研究計畫


文件中的檔案:

  1. 972221E009080MY3(第1年).PDF
  2. 972221E009080MY3(第2年).PDF
  3. 972221E009080MY3(第3年).PDF

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。