Automatic Chinese unknown word extraction using small-corpus-based method

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chang, TH	en_US
dc.contributor.author	Lee, CH	en_US
dc.date.accessioned	2014-12-08T15:26:20Z	-
dc.date.available	2014-12-08T15:26:20Z	-
dc.date.issued	2003	en_US
dc.identifier.isbn	0-7803-7902-0	en_US
dc.identifier.uri	http://hdl.handle.net/11536/18707	-
dc.description.abstract	Chinese unknown word extraction is an important problem for Chinese language processing. There are troublesome difficulties in the problem. First, almost any Chinese character can either represent a word or be a part of other words. Secondly, there is no blank between Chinese words for identifying the boundaries. Although some approaches have been proposed, there are some drawbacks in these methods. In this paper, we present and develop a method to extract Chinese unknown words more efficiently and precisely. It retains efficiency and accuracy even though the size of document set is small for training. It can also extract the unknown words occur rarely. Based on these advantages, it is very practical for real applications.	en_US
dc.language.iso	en_US	en_US
dc.subject	Chinese unknown word	en_US
dc.subject	Corpus-based method	en_US
dc.title	Automatic Chinese unknown word extraction using small-corpus-based method	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.journal	2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS	en_US
dc.citation.spage	459	en_US
dc.citation.epage	464	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000189300200077	-
Appears in Collections:	Conferences Paper