完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | 吳智瑋 | en_US |
dc.contributor.author | Wu, Ji-Wei | en_US |
dc.contributor.author | 蔡文能 | en_US |
dc.contributor.author | 曾秋蓉 | en_US |
dc.contributor.author | Tsai, Wen-Nung | en_US |
dc.contributor.author | Tseng, Judy C.-R. | en_US |
dc.date.accessioned | 2014-12-12T02:37:12Z | - |
dc.date.available | 2014-12-12T02:37:12Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.uri | http://140.113.39.130/cdrfb3/record/nctu/#GT079755841 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/73189 | - |
dc.description.abstract | 在一般的長文件中,其內容常由數個不同的主題或子主題所構成,文件切割的目的在於將文件內容切割成若干個主題連貫的主題段落,每個主題段落由數個連續的句子或段落所構成。文件切割已被證實能夠改善資訊檢索及文件摘要等自然語言處理(natural language processing)之效能。在過去的研究中,已有數種文件切割演算法被提出,雖然這些演算法已被證實可以改善文件切割的效能,但依然存在著一些可改進之處:有些演算法具有較低的運算複雜度,但其切割正確性卻不盡理想;有些演算法雖然具有相當高的切割正確性,但其運算複雜度卻相當高。此外,有些演算法需進行參數最佳化或依賴人工的方式定義所需的參數,除了造成使用者的負擔外,由人工定義之參數可能無法反應真實的文件結構。 為解決上述問題,本論文提出三個文件切割演算法。首先,以離散粒子群最佳化演算法(Discrete Particle Swarm Optimization, DPSO)為基礎之文件切割演算法利用全域資訊、全域評估及全域最佳化之離散粒子群最佳化演算法找尋主題段落切割點,同時兼顧切割正確性及運算複雜度。接下來,一個以聚合式階層分群法(Hierarchical Agglomerative Clustering, HAC)為基礎的文件切割演算法,低運算複雜度且不需人工定義參數及其它輔助資料即可將文件切割成數個主題段落。隨後,一個結合上述兩個演算法優點之混合式文件切割演算法,除不需進行參數設定外,又可同時兼顧切割正確性及運算複雜度。最後,本論文也將文件切割技術分別應用於知識管理及數位學習,從效能評估的結果中證實,文件切割技術可成功提升知識管理及數位學習之應用效能。 | zh_TW |
dc.description.abstract | The task of text segmentation is to divide a long text into several shorter segments, each of which shares a common topic. It has been shown that text segmentation is beneficial to several natural language processing tasks, such as information retrieval and text summarization. Many algorithms have been proposed and shown to improve the performance of text segmentation. However, previous studies often suffer from either lower segmentation accuracy or higher computational complexity. Moreover, parameter setting is also a critical problem in some algorithms. Although manual assignment is an approach to solve this problem, it may increase the user’s burden, and the parameters provided may not always be suitable to reflect the real metadata of a text. To tackle with these problems, three novel text segmentation algorithms are proposed in this dissertation. At first, a text segmentation algorithm based on Discrete Particle Swarm Optimization (called DPSOTS), is proposed. DPSOTS finds topical segments by using global information, global measurement, and a global optimization algorithm, DPSO, which improves both segmentation accuracy and computational complexity. Subsequently, an efficient text segmentation algorithm based on Hierarchical Agglomerative Clustering (called TSHAC), is proposed. TSHAC is implemented without parameter setting and user involvement. Finally, a hybrid algorithm, TSHAC-DPSO, is proposed. As well as TSHAC, TSHAC-DPSO is implemented without parameter setting. Moreover, TSHAC-DPSO fully utilizes the merits of both algorithms which not only improve the accuracy of text segmentation, but also make the execution more efficient and flexible. As examples, two applications of text segmentation in knowledge management and e-learning are also introduced in this dissertation. It has been demonstrated that text segmentation can be successfully applied in both applications. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | 文件切割 | zh_TW |
dc.subject | 離散粒子群最佳化 | zh_TW |
dc.subject | 聚合式階層分群法 | zh_TW |
dc.subject | 知識管理 | zh_TW |
dc.subject | 數位學習 | zh_TW |
dc.subject | Text segmentation | en_US |
dc.subject | Discrete particle swarm optimization | en_US |
dc.subject | Hierarchical agglomerative clustering | en_US |
dc.subject | Knowledge management | en_US |
dc.subject | E-learning | en_US |
dc.title | 文件切割方法及應用 | zh_TW |
dc.title | Text Segmentation: Methodology and Application | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | 資訊科學與工程研究所 | zh_TW |
顯示於類別: | 畢業論文 |