標題: | 社群網路興趣探勘 Mining Interest Topics from Plurk |
作者: | 李宜謙 Lee, Yi-Chien 蔡錫鈞 Tsai, Shi-Chun 資訊科學與工程研究所 |
關鍵字: | 社群網路;噗浪;Social Networking;Plurk;Social Networking Service Discovery;SNSD |
公開日期: | 2012 |
摘要: | 近年來隨著社群網路服務的蓬勃發展,越來越多使用者透過微網誌服務認識新朋友。然而,利用微網誌服務來認識新朋友會遇到一個問題:例如當你看到一位微網誌使用者的大頭照覺得她是你喜歡的異性類型,進而想要認識這位網友,那麼你可能得先大致看完她的留言內容,先大致了解這位網友對哪些話題有興趣後開始嘗試談話。因為大部分的微網誌服務沒有提供類似Facebook 的個人資訊頁面,陌生人無法透過閱讀使用者主動提供的資訊去投其所好,加上微網誌的文章發表量很大,想要看完一個人的留言去推測他的興趣是很困難的。此外,如果想要認識的異性網友沒有公開她的留言,那麼想要一親芳澤的難度就更高了,因為你無從得知她對那些話題有興趣。
為了解決上述的問題,我們針對噗浪(Plurk)微網誌服務設計了一套興趣探勘系統。這套探勘系統能夠快速整理受測者發表過的關鍵字並視覺化該使用者的交友網路。若受測者將他的時間軸設定為私密狀態,意即留言內容不公開,我們透過整合該受測者朋友的留言資訊去推測他會感到興趣的話題與關鍵字。我們也可將受測者感興趣的關鍵字使用於個人化、廣告業務以及朋友推薦等應用。
為了快速蒐集噗浪上的資訊,我們開發了一套基於 ZeroMQ 的分散式資料蒐集框架並佈署到多台機器上增加資料蒐集的速度。此外,由於噗浪的Python API 函式庫效能不甚理想,所以我們透過更換 JSON 函式庫、強化 HTTP 連線管理以及撰寫 OpenSSL 擴充套件加速HMAC-SHA1運算速度等手段改善效能瓶頸並大幅增加蒐集的效率。 People started to make friends with micro-blogging service in recent years; however, it is difficult to read all messages posted by those whom you are interested in but not familiar with to find out what he/she is interested in to start a conversation. Furthermore, unlike blog or Facebook, most of micro-blogging services do not provide profile functionality (self-description page) for users to describe him/her-self for people to know what he/she is interested in. To address this demand, we build an online Social Networking Service Discovery (SNSD) system for Plurk users (plurkers) to find out a plurker's interest topics/keywords and relationships/connections. The results are presented in graphics on a web browser. With the derived interests and relationships/connections, applications of the system include friend recommendations and personalized advertisements. To enhance crawling performance, we develop a distributed crawling system based on ZeroMQ messaging protocol and deploy it on multiple machines to crawl data from Plurk. In addition, we patch the Plurk API library for Python to enhance throughput by replacing the standard library with high-performance JSON library, optimizing HTTP connections and customizing Python C-extensions to accelerate HMAC-SHA1 computation. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079955559 http://hdl.handle.net/11536/50474 |
顯示於類別: | 畢業論文 |