標題: | Bilingual sentence alignment based on punctuation statistics and lexicon |
作者: | Chuang, TC Wu, JC Lin, T Shei, WC Chang, JS 電信工程研究所 Institute of Communications Engineering |
公開日期: | 2005 |
摘要: | This paper presents a new method of aligning bilingual parallel texts based on punctuation statistics and lexical information, It is demonstrated that the punctuation statistics prove to be effective means to achieve good results. The task of sentence alignment of bilingual texts written in disparate language pairs like English and Chinese is reportedly more difficult. We examine the feasibility of using punctuations for high accuracy sentence alignment. Encouraging precision rate is demonstrated in aligning sentences in bilingual parallel corpora based solely on punctuation statistics. Improved results were obtained when both punctuation statistics and lexical information were employed. We have experimented with an implementation of the proposed method on the parallel corpora of Sinorama Magazine and Records of the Hong Kong Legislative Council with satisfactory results. |
URI: | http://hdl.handle.net/11536/25099 |
ISBN: | 3-540-24475-1 |
ISSN: | 0302-9743 |
期刊: | NATURAL LANGUAGE PROCESSING - IJCNLP 2004 |
Volume: | 3248 |
起始頁: | 224 |
結束頁: | 232 |
顯示於類別: | 會議論文 |