完整後設資料紀錄
DC 欄位語言
dc.contributor.author吳智瑋en_US
dc.contributor.authorWu, Ji-Weien_US
dc.contributor.author蔡文能en_US
dc.contributor.author曾秋蓉en_US
dc.contributor.authorTsai, Wen-Nungen_US
dc.contributor.authorTseng, Judy C.-R.en_US
dc.date.accessioned2014-12-12T02:37:12Z-
dc.date.available2014-12-12T02:37:12Z-
dc.date.issued2013en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT079755841en_US
dc.identifier.urihttp://hdl.handle.net/11536/73189-
dc.description.abstract在一般的長文件中,其內容常由數個不同的主題或子主題所構成,文件切割的目的在於將文件內容切割成若干個主題連貫的主題段落,每個主題段落由數個連續的句子或段落所構成。文件切割已被證實能夠改善資訊檢索及文件摘要等自然語言處理(natural language processing)之效能。在過去的研究中,已有數種文件切割演算法被提出,雖然這些演算法已被證實可以改善文件切割的效能,但依然存在著一些可改進之處:有些演算法具有較低的運算複雜度,但其切割正確性卻不盡理想;有些演算法雖然具有相當高的切割正確性,但其運算複雜度卻相當高。此外,有些演算法需進行參數最佳化或依賴人工的方式定義所需的參數,除了造成使用者的負擔外,由人工定義之參數可能無法反應真實的文件結構。 為解決上述問題,本論文提出三個文件切割演算法。首先,以離散粒子群最佳化演算法(Discrete Particle Swarm Optimization, DPSO)為基礎之文件切割演算法利用全域資訊、全域評估及全域最佳化之離散粒子群最佳化演算法找尋主題段落切割點,同時兼顧切割正確性及運算複雜度。接下來,一個以聚合式階層分群法(Hierarchical Agglomerative Clustering, HAC)為基礎的文件切割演算法,低運算複雜度且不需人工定義參數及其它輔助資料即可將文件切割成數個主題段落。隨後,一個結合上述兩個演算法優點之混合式文件切割演算法,除不需進行參數設定外,又可同時兼顧切割正確性及運算複雜度。最後,本論文也將文件切割技術分別應用於知識管理及數位學習,從效能評估的結果中證實,文件切割技術可成功提升知識管理及數位學習之應用效能。zh_TW
dc.description.abstractThe task of text segmentation is to divide a long text into several shorter segments, each of which shares a common topic. It has been shown that text segmentation is beneficial to several natural language processing tasks, such as information retrieval and text summarization. Many algorithms have been proposed and shown to improve the performance of text segmentation. However, previous studies often suffer from either lower segmentation accuracy or higher computational complexity. Moreover, parameter setting is also a critical problem in some algorithms. Although manual assignment is an approach to solve this problem, it may increase the user’s burden, and the parameters provided may not always be suitable to reflect the real metadata of a text. To tackle with these problems, three novel text segmentation algorithms are proposed in this dissertation. At first, a text segmentation algorithm based on Discrete Particle Swarm Optimization (called DPSOTS), is proposed. DPSOTS finds topical segments by using global information, global measurement, and a global optimization algorithm, DPSO, which improves both segmentation accuracy and computational complexity. Subsequently, an efficient text segmentation algorithm based on Hierarchical Agglomerative Clustering (called TSHAC), is proposed. TSHAC is implemented without parameter setting and user involvement. Finally, a hybrid algorithm, TSHAC-DPSO, is proposed. As well as TSHAC, TSHAC-DPSO is implemented without parameter setting. Moreover, TSHAC-DPSO fully utilizes the merits of both algorithms which not only improve the accuracy of text segmentation, but also make the execution more efficient and flexible. As examples, two applications of text segmentation in knowledge management and e-learning are also introduced in this dissertation. It has been demonstrated that text segmentation can be successfully applied in both applications.en_US
dc.language.isoen_USen_US
dc.subject文件切割zh_TW
dc.subject離散粒子群最佳化zh_TW
dc.subject聚合式階層分群法zh_TW
dc.subject知識管理zh_TW
dc.subject數位學習zh_TW
dc.subjectText segmentationen_US
dc.subjectDiscrete particle swarm optimizationen_US
dc.subjectHierarchical agglomerative clusteringen_US
dc.subjectKnowledge managementen_US
dc.subjectE-learningen_US
dc.title文件切割方法及應用zh_TW
dc.titleText Segmentation: Methodology and Applicationen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文