標題: 在行動裝置資料探勘通話與移動行為
Exploring Communication and Mobility Behaviors in Mobile Media
作者: 陳建成
彭文志
徐國偉
Chen, Chien-Cheng
Peng, Wen-Chih
Hsu, Kuo-Wei
資訊科學與工程研究所
關鍵字: 行動裝置;通話行為;移動行為;特徵工程;移動演進特徵;軌跡特徵探勘;mobile media;communication behavior;mobility behavior;feature engineering;mobility evolution pattern;trajectory pattern mining
公開日期: 2017
摘要: 隨著智慧型手機與行動網路的普及,有越來越多的服務被開發在行動裝置上,舉例來說像是電話服務或者是智慧型手機上的各種應用程式。隨著使用者日常生活的使用,大量的資料在行動裝置上產生,也因此我們可以藉由這些資料來探勘使用者行為。在不同服務所產生的不同種類的資料可以應用於不同的方面,在這篇論文中,我們主要專注在通話和移動這兩類型資料的使用者行為探勘,在其中包含三個主題:潛在使用者、移動演進特徵、時空語意軌跡特徵。 在第一個主題中,我們主要從使用者的通聯紀錄裡找尋潛在使用者,所謂的潛在使用者指的就是該使用者在未來可能會使用我們所提供的服務。對於電信業者而言,在通訊服務中,如何吸引與留住使用者是個重要的議題,因為潛在使用者使用其他業者的服務,我們只能獲取該使用者相當少的資訊,因此要如何從通話紀錄去取得潛在使用者的特徵變成是一個挑戰。為了解決這個問題,我們提出了基於通話的特徵產生框架(Communication-Based Feature Generation Framework)來萃取特徵並找出潛在使用者,我們從使用者間的通話關係來取得顯式特徵,再來,我們從使用者的群組架構萃取出隱含特徵,接著我們使用信息增益(Information-Gain)找出具代表的特徵,並利用三種分類演算法建立模型來找尋潛在使用者。除此之外,我們提出了收益函式(Profit Function)來解決建立分類模型時資料不平衡的問題。透過在通話紀錄資料上的實驗,我們找到的特徵有助於找到潛在的使用者。 於第二個主題,我們主要從使用者的打卡資料找出使用者的日常移動行為。打卡資料包含空間和時間的資訊,因此我們提出了一個移動演進特徵(Mobility Evolution Pattern)來表示使用者的日常移動行為。移動演進特徵主要是由空間分布和對應的時間區間所組合而成的一連串的片段。要如何從打卡資料取得好的片段分割,我們將這個問題轉換成一個壓縮的問題,並且利用最小描述長度原則(Minimum Description Length Principle)來計算移動演進特徵的表示長度,藉此得到好的移動演進特徵,我們再利用叢集演算法來對移動演進特徵進行分群,找出具代表性的移動演進特徵。透過在打卡資料的實驗,我們可以有效率地找到好的移動演進特徵。 在第三個主題中,我們發現原始軌跡資料的資訊不足,為了強化原始軌跡資料的資訊,我們提出了時空語意軌跡特徵(Spatial-Temporal Semantic Trajectory Pattern)來達到強化資訊的目的,其中時空語意軌跡特徵代表一種包含時間、空間、語意的移動特徵。為了萃取該特徵,我們先將原始軌跡資料轉換成語意軌跡序列,在透過序列符號化將語意軌跡序列轉換成符號序列,因此我們可以應用有效率的序列探勘演算法PrefixSpan來找到時空語意軌跡特徵。透過使用者軌跡資料的實驗,我們證實我們所提出的方法在找空語意軌跡特徵是有效且有效率。
With the popularity of smart phones and mobile networks, lots of services are deployed in mobile media. For example, users can make a phone call and use lots of applications in their phones. Therefore, users generate huge volumes of data in mobile media from their daily life, and we can mine user behaviors from these data. From data sources of different services, the user behaviors can be used for the different purposes. In this dissertation, we focus on mining user behaviors on communication and mobility data in three tasks: potential users, mobility evolution patterns, and spatial-temporal semantic trajectory patterns. In task 1, given a set of communication logs, the main theme of our work is to identify the potential users who will possibly join the target services in the near future. For telecom operators, how to acquire and retain users is a significant and practical task in mobile communication services. Since only a limited amount of information is available for potential users from other telecom operators, one challenging issue is how to extract features from the communication logs. In this task, we propose a Communication-Based Feature Generation (CBFG) framework that extracts features and builds models to infer the potential users. We extract the explicit features from the users' interaction behaviors. Moreover, based on the community structures of users, we further extract the implicit features of users. According to the effective features selected by information-gain, we utilize three popular classifiers to build models to target the potential users. In addition, we design a profit function to sample training data of classifiers for data imbalance. We conducted experiments on communication logs, and the results of our experiments show that the features extracted by our proposed method can be effective for targeting the potential users. In task 2, given a set of check-in data, we aim at discovering representative daily movement behavior of users. Since check-in data contain both spatial and temporal information, we propose a mobility evolution pattern to capture the daily movement behavior of users. Specifically, mobility evolution patterns consist of segments with the spatial region distribution and the corresponding time interval. To measure good segmentation from a set of check-in data, we formulate the problem of mining evolution patterns as a compression problem, and compute the representation length of the patterns based on the Minimum Description Length (MDL) principle. We further cluster the daily mobility evolution patterns into groups, and discover representative patterns. We conducted experiments on the check-in datasets, and the experimental results show the effectiveness and efficiency of our proposed methods. In task 3, to enrich raw trajectories, we propose STS-TPs (standing for Spatial-Temporal Semantic Trajectory Patterns) which refer to the moving patterns with spatial, temporal, and semantic attributes. Given a set of user trajectories, we aim at mining STS-TPs. Explicitly, we extract the three attributes from raw trajectories, and convert these trajectories into semantic trajectory sequences. Given a set of such semantic trajectory sequences, STS-TPs could be viewed as sequential patterns with multiple attributes. To fully explore the efficiency of PrefixSpan on sequential pattern mining, we propose a PrefixSpan-based algorithm to discover STS-TPs. Note that the input for PrefixSpan is a set of sequences consisting of items. Thus, we propose algorithms of Sequence Symbolization to further transform these sequences into symbolized sequences. We conducted experiments on the location log datasets, and the experimental results show the effectiveness and efficiency of our proposed algorithms.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079955827
http://hdl.handle.net/11536/140727
顯示於類別:畢業論文