基於圖論方法之社交媒體探勘

Full metadata record

DC Field	Value	Language
dc.contributor.author	江孟芬	en_US
dc.contributor.author	Chiang, Meng-Fen	en_US
dc.contributor.author	彭文志	en_US
dc.contributor.author	俞士綸	en_US
dc.contributor.author	Peng, Wen-Chich	en_US
dc.contributor.author	Y, Philip S.	en_US
dc.date.accessioned	2014-12-12T01:26:09Z	-
dc.date.available	2014-12-12T01:26:09Z	-
dc.date.issued	2012	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT079555804	en_US
dc.identifier.uri	http://hdl.handle.net/11536/41421	-
dc.description.abstract	在社交媒體發達的時代，伴隨著定位技術的普及，我們得以收集與觀察使用者在網際網路瀏覽軌跡與真實世界的移動軌跡資料。本論文中，我們主要觀察與探勘使用者在兩種類型的社會媒体，分別是：位置感知的社交媒體與網路社交媒体。從位置感知的社交媒體 (e.g., Foursquare) 所收集的使用者軌跡資料通常為低採樣率的序列資料，亦即連續資料點的時間間隔分布可從數秒到數個小時。因應低採樣率的軌跡資料探勘，分別引出了下列挑戰：資料稀疏性的處理、使用者行動力模型的建立，以及行動力模型的效能。因此，基於位置感知的社交媒體，我們解決了這些挑戰，分別發展了兩種應用服務，分別是位置預測與行動力演化模型的探勘。最後，我們針對網路社交媒體發展使用者行為建模與探勘，特別是針對網路問與答(QA)論壇的社交環境，例如，雅虎奇摩知識家、百度知道。使用者在QA論壇的瀏覽軌跡可以頁面序列的形式呈現，而大量使用者的瀏覽軌跡將有助於我們在巨大的網路空間裡揀選出高品質的QA頁面。因應大量瀏覽軌跡的探勘，分別引出了下列挑戰：料稀疏性的處理、瀏覽行為模型的建立與探勘大量資料的效能維護。因此，基於網路社交媒體，我們解決了這些挑戰，分別發展了兩種應用服務，分別是QA頁面排序與相關QA頁面推薦。	zh_TW
dc.description.abstract	With the emergence of mobile devices in the social media era, tracking user activities from social media in Cyber space or the physical world becomes feasible. In this thesis, we focus on mining user behaviors from two types of social media, location-based and Cyber-based social media. A user trajectory collected from location-based social media (e.g., Foursquare etc.) is a low-sampling-rate sequence of data points, where each data point corresponds to a check-in record performed with time and location information. Analyzing low-sampling-rate trajectories leads to new requirements for location-based services: managing data sparsity, developing mobility models and efficient processing. Driven by these requirement, we have developed two location-based services: a location prediction model for distant-time queries and a user mobility profiling model from low-sampling-rate trajectories. We address the problem of sparsity in low-sampling-rate trajectories and propose an adaptive temporal exploration approach to retrieve supporting trajectories. These supporting trajectories are further formed into a time-constrained mobility graph. We have designed a Reachability-based prediction model on the Time-constrained Mobility Graph (RTMG) that follows the principle of random walk simulation with the restart from the current location. To support efficient query processing, we have developed an index structure and corresponding operators for efficient query processing. Extensive experiments with real data demonstrate the effectiveness and efficiency of RTMG with adaptive temporal exploration over varying data sparsity. We formulate the problem of mining mobility dynamics from check-in trajectories for user mobility profiling. Given a check-in trajectory of a user, we aim to divide the trajectories into a sequence of segments, with each segment associated with a time-interval and a group of hot regions where the user appears during this time-interval. Initially, we divide trajectories into segments of equal time units. Then, we define the change point based on the spatial distribution. To measure the segmentation quality, we have designed a quality metric based on the MDL principle. Based on the quality metric, we have developed a family of greedy algorithms to automatically derive a sequence of segments and their corresponding regions and valid time intervals. We have conducted experiments on real datasets to demonstrate the effectiveness of the proposed algorithms. Finally, we focused on mining user behaviors from Cyber-based social media, particularly on-line Question and Answering (QA) forums (e.g., Y! Answers), for QA page ranking and QA page recommendation. User browsing logs collected from such QA forums record users' browsing trajectories from page to page. Specifically, the browsing trajectories form a fraction of the hyper-links actually browsed by users and thus can reveal the true value of pages out of the enormous Cyber space. Analyzing large-scale browsing trajectories leads to new requirements: managing data sparsity, data models and scalability. To address the sparsity issue, we explored the latent browsing relations among QA pages to build a \emph{QA Latent Browsing Graph}. Based on this graph, we incorporate the staying time distribution of the QA pages and propose a Latent Browsing Rank (abbreviated as LBR) to determine the importance of the QA pages based on which we recommend QA pages with a higher Latent Browsing Recommendation Rank (abbreviated as LBRR). Furthermore, we have conducted extensive experiments to demonstrate the effectiveness of latent browsing relations on Yahoo! Asia Knowledge Plus. The experimental results indicate that our framework can recommend relevant and high quality QA pages by exploring the QA latent browsing graph. To cope with the scalability issue, we also propose computing recommendation results for multiple queries at the same time over a cloud computing platform. When the number of queries is large, our solution can speed up computation per query compared to the sequential computing mechanism.	en_US
dc.language.iso	en_US	en_US
dc.subject	社交媒體	zh_TW
dc.subject	圖形探勘	zh_TW
dc.subject	軌跡探勘	zh_TW
dc.subject	位置預測	zh_TW
dc.subject	行動力演化	zh_TW
dc.subject	重啟動的隨機遊走	zh_TW
dc.subject	Social Media	en_US
dc.subject	Graph Mining	en_US
dc.subject	Trajectory Mining	en_US
dc.subject	Location Prediction	en_US
dc.subject	Mobility Evolution	en_US
dc.subject	Random Walk with Restart	en_US
dc.title	基於圖論方法之社交媒體探勘	zh_TW
dc.title	Graph-based Approaches for Social Media Mining	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis