標題: 部落格分群
Blog Clustering
作者: 陳佑州
Chen, You-Chou
李嘉晃
Lee, Chia-Hoang
資訊科學與工程研究所
關鍵字: 部落格;分群;blog;clustering
公開日期: 2009
摘要: 網際網路能夠如此快速的發展成為現代人生活中不可或缺的一部分,搜尋引擎的出現功不可沒,但是現今的搜尋技術幾乎都是以關鍵字來查找網頁,也就是當使用者輸入關鍵字之後,搜尋引擎幫忙找出含有這個關鍵字的網頁。假設你是一位部落格作者,擁有自己的部落格,搜尋引擎目前並無法根據你撰寫文章的主題來自動找出你可能有興趣閱讀的部落格,因為目前搜尋引擎無法根據部落格的特徵自動分群。因此,在本篇論文中,我們將研究如何將網路上的部落格根據其主題分群,藉此找出有相同興趣的作者。我們提出利用部落格的標籤雲來代表部落格的概念。標籤雲就是一個部落格中所有文章之標籤的集合,部落格上的標籤是由人工所標記的,很適合用來代表一篇文章的概念或主題,所以本系統直接以標籤雲來代表部落格而不是藉由分析每一篇文章來找出部落格的主題。得到部落格的表示法後,就可以計算部落格與部落格之間的相似度,接著再使用不同的分群演算法將部落格分群,比較其結果。根據實驗結果可知,我們幾乎可以很準確的把相同主題的部落格分在同一個群中,這代表同一個群中之部落格的作者都有著相同的興趣或喜好。
Discovering social interests from user's blog content or social tags is one of the interesting and challenging problems in social network research. We tackle this problem using blog clustering based on the tags of blogs. In blog representation, we employ the tags of a blog to represent the blogger's interests and discover user's common interests using blog clustering. In this paper, we propose two kinds of approaches to tackle this problem. In the first approach, we employ spectral clustering to cluster the blogs in the concept vector space. The construction of concept vector representation is similar to dimensionality reduction. First, we regard the Web as system corpus to measure the relevance of two tags based on the hits returned from the search engine. Second, a balanced hierarchical agglomerative clustering, which takes into account the size of the clusters, is proposed to aggregate the tags that are relevant. Finally, the original tag vector representation can be transformed into its corresponding concept vector representation. The experimental results show that the F1 value can be improved a lot as compared with the clustering in the tag vector space. In the second approach, we propose to employ multidimensional scaling technique to perform dimensionality reduction and then apply K-means clustering in the reduced coordinates. The experimental results show that our approaches can effectively cluster the blogs with similar interests and it can be applied to other social network clustering easily.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079755631
http://hdl.handle.net/11536/45975
顯示於類別:畢業論文


文件中的檔案:

  1. 563101.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。