標題: | Using Machine Learning Approach to Identify Synonyms for Document Mining |
作者: | Trappey, Amy J. C. Trappey, Charles V. Wu, Jheng-Long Tsai, Kevin T. -C 管理科學系 Department of Management Science |
關鍵字: | Synonym Extraction;Machine Learning;Pattern-based Extraction;Self-supervised Learning |
公開日期: | 1-一月-2019 |
摘要: | Technical or knowledge documents, such as research papers, patents, and technical documents, e.g., request for quotations (RFQ), are important knowledge references for multiple purposes. For example, enterprises and R&D institutions often need to conduct literature and patent searches and analyses before, during, and after R&D and commercialization. These knowledge discovery processes help them identify prior arts related to the current R&D efforts to avoid duplicating research efforts or infringing upon existing intellectual property rights (IPRs). It is common to have many synonyms (i.e., words and phrases with near-identical meanings) appeared in documents, which may hinder search results, if queries do not consider these synonyms. For instance, conducting "freedom-to-operate" (FTO) patent search may not find all related patents if synonyms were not taking into consideration. This research develops methodologies of generating domain specific "word" and "phrase" synonym dictionaries using machine learning. The generation and validation of both domain-specific "word" and "phrase" synonym dictionaries are conducted using more than 2000 solar power related patents as testing document set. The testing result shows that, in the solar power domain, both word level and phrase level dictionaries identify synonyms effectively and, thus, significantly improve the patent search results. |
URI: | http://dx.doi.org/10.3233/ATDE190158 http://hdl.handle.net/11536/155069 |
ISBN: | 978-1-64368-021-7; 978-1-64368-020-0 |
ISSN: | 2352-7528 |
DOI: | 10.3233/ATDE190158 |
期刊: | TRANSDISCIPLINARY ENGINEERING FOR COMPLEX SOCIO-TECHNICAL SYSTEMS |
Volume: | 10 |
起始頁: | 509 |
結束頁: | 518 |
顯示於類別: | 會議論文 |