基於地標選擇之多層異質網路表示學習加速

標題:	基於地標選擇之多層異質網路表示學習加速 On Accelerating Multi-Layered Heterogeneous Network Representation Learning via Landmark Selection
作者:	蔡政銘帥宏翰 Tsai, Cheng-Ming 電機工程學系
關鍵字:	異質表示學習;網路表示;特徵學習;維度縮減;元路徑;地標選擇;表徵學習;Heterogeneous representation learning;Network representation;Feature learning;Dimension reduction;Metapath;Landmark selection;Embedding learning
公開日期:	2017
摘要:	網路表示(graph representation)，旨在將大型資訊網路在低維度的向量空間中表示，已經在同質網路的範疇廣泛地被研究。推算得到資訊網路表示可用於數種應用，像是視覺化整個網路架構、做節點的分類問題、或是用於偵測社群。表示在這些資料分析的工作中扮演著重要的角色。由於許多不同種節點間的複雜關係，儘管異質網路蘊藏了比同質網路更多的潛在特徵，卻鮮少被研究。一種簡單的方式即將異質網路視為同質網路，將其利用現有演算法亦可得到網路表示，然而卻會造成資訊流失與計算速度緩慢的問題。因此，我們首先利用元路徑(metapath)指出異質點間有意義之路徑，以便產生網路表示時，網路上相近的點在另一空間亦為相近。此外，由於網路中節點對於網路表示之計算並非一樣重要，因此我們提出地標選擇，目的在於給節點們優先次序排列。高優先度的節點有更多的機會可以被訓練，以取得更好的表示。我們的地標選擇，將焦點鎖定在每個walk初始節點的分配。我們設計鄰居數指標(degree centrality)—一個根據相連的邊的數量，來排序節點的方法—作為決定地標的標準。我們將表示在多標籤分類法中，Micro-F1和Macro-F1的結果，用來衡量兩個方法的成效。元路徑展示了其優於同質表示方法的一面，而地標選擇則將該成效提升至更進一步的水平。 Network representation, embedding large information networks into low dimensional vector spaces, has been widely studied in homogeneous networks. Deriving the latent representations of the information networks can apply to data analysis methods such as visualizing the entire network, classifying nodes into their belonging classes, and detecting communities. Representation serves a crucial role in those data analyzing tasks. Heterogeneous networks, containing more hidden features not available in homogeneous networks, however, are less studied. One straightforward method is to view a heterogeneous network as a homogeneous one and obtain its representation using existing algorithms. Yet, data loss and computational inefficiency is the bottleneck of previous methods. Hence, we first use metapath to highlight those meaningful paths so that pairs of nodes close in networks would also be near in the representation space. Landmark selection, as a result of that nodes differ in the importance to representation learning, purposes to give the nodes a priority order. High priority nodes are provided with more chances to train their representations. Our landmark selection concentrates on the distribution of the starting nodes of each walk. We design degree centrality as the criteria to determine landmarks, which rank the nodes by the number of their linked edges. The effectiveness of both methods is testified through the multi-label classification results in terms of Micro-F1 and Macro-F1 score. Metapath demonstrates its strength over conventional homogeneous representation methods while landmark selection further promotes the benefits to an even higher level.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450726 http://hdl.handle.net/11536/142800
Appears in Collections:	Thesis