標題: Using Machine Learning Approach to Identify Synonyms for Document Mining
作者: Trappey, Amy J. C.
Trappey, Charles V.
Wu, Jheng-Long
Tsai, Kevin T. -C
管理科學系
Department of Management Science
關鍵字: Synonym Extraction;Machine Learning;Pattern-based Extraction;Self-supervised Learning
公開日期: 1-Jan-2019
摘要: Technical or knowledge documents, such as research papers, patents, and technical documents, e.g., request for quotations (RFQ), are important knowledge references for multiple purposes. For example, enterprises and R&D institutions often need to conduct literature and patent searches and analyses before, during, and after R&D and commercialization. These knowledge discovery processes help them identify prior arts related to the current R&D efforts to avoid duplicating research efforts or infringing upon existing intellectual property rights (IPRs). It is common to have many synonyms (i.e., words and phrases with near-identical meanings) appeared in documents, which may hinder search results, if queries do not consider these synonyms. For instance, conducting "freedom-to-operate" (FTO) patent search may not find all related patents if synonyms were not taking into consideration. This research develops methodologies of generating domain specific "word" and "phrase" synonym dictionaries using machine learning. The generation and validation of both domain-specific "word" and "phrase" synonym dictionaries are conducted using more than 2000 solar power related patents as testing document set. The testing result shows that, in the solar power domain, both word level and phrase level dictionaries identify synonyms effectively and, thus, significantly improve the patent search results.
URI: http://dx.doi.org/10.3233/ATDE190158
http://hdl.handle.net/11536/155069
ISBN: 978-1-64368-021-7; 978-1-64368-020-0
ISSN: 2352-7528
DOI: 10.3233/ATDE190158
期刊: TRANSDISCIPLINARY ENGINEERING FOR COMPLEX SOCIO-TECHNICAL SYSTEMS
Volume: 10
起始頁: 509
結束頁: 518
Appears in Collections:Conferences Paper