中文文句自動斷詞標詞類之研究與應用

Full metadata record

DC Field	Value	Language
dc.contributor.author	蘇育新	en_US
dc.contributor.author	Yuh-Shin Su	en_US
dc.contributor.author	陳信宏	en_US
dc.contributor.author	Sin-Horng Chen	en_US
dc.date.accessioned	2014-12-12T02:12:21Z	-
dc.date.available	2014-12-12T02:12:21Z	-
dc.date.issued	1993	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT820436031	en_US
dc.identifier.uri	http://hdl.handle.net/11536/58160	-
dc.description.abstract	在本論文中，我們主要研究一套可對中文文句做自動斷詞標詞類的語言模型，及基本的音轉字語言模型。我們首先以統計法及幾種類神經網路法訓練出不同的語言模型參數，並設計自動標詞類系統，以對各模型參數進行評估；之後我們選擇統計法及較好的類神經網路法的模型參數，並結合幾種簡單的構詞法則，完成自動斷詞標詞類系統。此外，我們也以這些模型參數設計了初步的音轉字語言模型。在我們的實驗中，訓練語料庫有1930 3個詞，測試語料庫有4836個詞。在外部測試(Outside Test)方面，以統計法所做的實驗可達97.1﹪的斷詞率及94.4﹪的詞類標示率，而在類神經網路法方面，斷詞率為97.3﹪，詞類標示率則為94.2﹪。另外，音轉字的正確率以統計法可達91.0﹪，而類神經網路法則為90.9﹪。 Two approaches of automatic segmentation and tagging for Chinese sentences are studied in this thesis. One is a statistical approach which uses an explicit bigram language model and the other is a neural net approach which uses MLP to predict POS's of words. Performance of these two methods was examined by simulations using a database with 19303 training words and 4836 testing words. Segmentation rates and tagging rates of 97.1% and 94.4% for the statistical method and of 97.3% and 94.2% for the neural net method were achieved. Extension of these two methods to the application of phoneme-to- text conversion is also studied using the same database. Character accuracy rates of 91.0% and 90.9% were respectively obtained by these two methods.	zh_TW
dc.language.iso	zh_TW	en_US
dc.subject	斷詞; 詞類標示	zh_TW
dc.subject	Word Segmentation; POS Tagging	en_US
dc.title	中文文句自動斷詞標詞類之研究與應用	zh_TW
dc.title	A Study on Automatic Segmentation and Tagging of Chinese Sentence	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis