Extraction of name and transliteration in monolingual and parallel corpora

標題:	Extraction of name and transliteration in monolingual and parallel corpora
作者:	Lin, T Wu, JC Chang, JS 電信工程研究所 Institute of Communications Engineering
公開日期:	2004
摘要:	Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.
URI:	http://hdl.handle.net/11536/27197
ISBN:	3-540-23300-8
ISSN:	0302-9743
期刊:	MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS
Volume:	3265
起始頁:	177
結束頁:	186
Appears in Collections:	Conferences Paper