基於圖論方法之社交媒體探勘

標題:	基於圖論方法之社交媒體探勘 Graph-based Approaches for Social Media Mining
作者:	江孟芬 Chiang, Meng-Fen 彭文志俞士綸 Peng, Wen-Chich Y, Philip S. 資訊科學與工程研究所
關鍵字:	社交媒體;圖形探勘;軌跡探勘;位置預測;行動力演化;重啟動的隨機遊走;Social Media;Graph Mining;Trajectory Mining;Location Prediction;Mobility Evolution;Random Walk with Restart
公開日期:	2012
摘要:	在社交媒體發達的時代，伴隨著定位技術的普及，我們得以收集與觀察使用者在網際網路瀏覽軌跡與真實世界的移動軌跡資料。本論文中，我們主要觀察與探勘使用者在兩種類型的社會媒体，分別是：位置感知的社交媒體與網路社交媒体。從位置感知的社交媒體 (e.g., Foursquare) 所收集的使用者軌跡資料通常為低採樣率的序列資料，亦即連續資料點的時間間隔分布可從數秒到數個小時。因應低採樣率的軌跡資料探勘，分別引出了下列挑戰：資料稀疏性的處理、使用者行動力模型的建立，以及行動力模型的效能。因此，基於位置感知的社交媒體，我們解決了這些挑戰，分別發展了兩種應用服務，分別是位置預測與行動力演化模型的探勘。最後，我們針對網路社交媒體發展使用者行為建模與探勘，特別是針對網路問與答(QA)論壇的社交環境，例如，雅虎奇摩知識家、百度知道。使用者在QA論壇的瀏覽軌跡可以頁面序列的形式呈現，而大量使用者的瀏覽軌跡將有助於我們在巨大的網路空間裡揀選出高品質的QA頁面。因應大量瀏覽軌跡的探勘，分別引出了下列挑戰：料稀疏性的處理、瀏覽行為模型的建立與探勘大量資料的效能維護。因此，基於網路社交媒體，我們解決了這些挑戰，分別發展了兩種應用服務，分別是QA頁面排序與相關QA頁面推薦。 With the emergence of mobile devices in the social media era, tracking user activities from social media in Cyber space or the physical world becomes feasible. In this thesis, we focus on mining user behaviors from two types of social media, location-based and Cyber-based social media. A user trajectory collected from location-based social media (e.g., Foursquare etc.) is a low-sampling-rate sequence of data points, where each data point corresponds to a check-in record performed with time and location information. Analyzing low-sampling-rate trajectories leads to new requirements for location-based services: managing data sparsity, developing mobility models and efficient processing. Driven by these requirement, we have developed two location-based services: a location prediction model for distant-time queries and a user mobility profiling model from low-sampling-rate trajectories. We address the problem of sparsity in low-sampling-rate trajectories and propose an adaptive temporal exploration approach to retrieve supporting trajectories. These supporting trajectories are further formed into a time-constrained mobility graph. We have designed a Reachability-based prediction model on the Time-constrained Mobility Graph (RTMG) that follows the principle of random walk simulation with the restart from the current location. To support efficient query processing, we have developed an index structure and corresponding operators for efficient query processing. Extensive experiments with real data demonstrate the effectiveness and efficiency of RTMG with adaptive temporal exploration over varying data sparsity. We formulate the problem of mining mobility dynamics from check-in trajectories for user mobility profiling. Given a check-in trajectory of a user, we aim to divide the trajectories into a sequence of segments, with each segment associated with a time-interval and a group of hot regions where the user appears during this time-interval. Initially, we divide trajectories into segments of equal time units. Then, we define the change point based on the spatial distribution. To measure the segmentation quality, we have designed a quality metric based on the MDL principle. Based on the quality metric, we have developed a family of greedy algorithms to automatically derive a sequence of segments and their corresponding regions and valid time intervals. We have conducted experiments on real datasets to demonstrate the effectiveness of the proposed algorithms. Finally, we focused on mining user behaviors from Cyber-based social media, particularly on-line Question and Answering (QA) forums (e.g., Y! Answers), for QA page ranking and QA page recommendation. User browsing logs collected from such QA forums record users' browsing trajectories from page to page. Specifically, the browsing trajectories form a fraction of the hyper-links actually browsed by users and thus can reveal the true value of pages out of the enormous Cyber space. Analyzing large-scale browsing trajectories leads to new requirements: managing data sparsity, data models and scalability. To address the sparsity issue, we explored the latent browsing relations among QA pages to build a \emph{QA Latent Browsing Graph}. Based on this graph, we incorporate the staying time distribution of the QA pages and propose a Latent Browsing Rank (abbreviated as LBR) to determine the importance of the QA pages based on which we recommend QA pages with a higher Latent Browsing Recommendation Rank (abbreviated as LBRR). Furthermore, we have conducted extensive experiments to demonstrate the effectiveness of latent browsing relations on Yahoo! Asia Knowledge Plus. The experimental results indicate that our framework can recommend relevant and high quality QA pages by exploring the QA latent browsing graph. To cope with the scalability issue, we also propose computing recommendation results for multiple queries at the same time over a cloud computing platform. When the number of queries is large, our solution can speed up computation per query compared to the sequential computing mechanism.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079555804 http://hdl.handle.net/11536/41421
Appears in Collections:	Thesis