论文学习–Multi-View Attribute Graph Convolution Networks for Clustering(MAGCN)

一、论文题目

Multi-View Attribute Graph Convolution Networks for Clustering

二、Abstract

(1) 本文提出的问题

1.在GNN中,现有方法无法将可学习的权重分配给邻域中的不同节点,并且由于忽略节点属性和图重建而缺乏鲁棒性;

2.大多数多视图 GNN 主要关注多图的情况,而设计用于解决多视图属性的图结构数据的 GNN 仍处于探索阶段。

(2)本文的创新之处

1.提出了一个多属性图卷积网络(MAGCN)用于聚类任务;
2.MAGCN 设计有双路径编码器,可映射图嵌入特征并学习视图一致性信息;
(1)第一个路径编码器:开发多视图属性图注意力网络以减少噪声/冗余信息,并学习多视图图数据的图嵌入特征;
(2)第二个路径编码器:开发一致的嵌入编码器捕获不同视图之间的几何关系和概率分布的一致性,从而自适应地为多视图属性找到一致的聚类嵌入空间;

三、Introduction

(一)Multi-view clustering

1.定义:
是机器学习中的一项基本任务。其目的是整合多个特征,并在不同视图之间找到一致的信息;
2.研究现状:
在欧洲领域取得了可观的研究成果;
3.存在挑战:
不再适合非欧洲领域的数据,也不适合目前一些深入研究的数据,比如社交网络、引文网络形成的图数据;
4.解决方法:
Graph embedding技术,可以有效地探索基于图结构的数据;

(二)Graph Embedding

1.定义:
将图数据转换为低维、紧凑、连续的特征空间,通常通过matrix-factorization、random-walk、GNN来实现;
2.研究现状:
(1)由于GNN的效率和inductive learning能力,GNN成为了当前最受欢迎的方法;
(2)GNN 通过应用多个图卷积层通过非线性变换和聚合函数收集节点邻居的信息来计算图节点的嵌入,通过此方式,既可以获得图数据的拓扑信息,也可获取节点的属性信息;
3.存在挑战:
虽然上述 GNN 可以有效地处理单视图图数据,但它们不适用于多视图图数据;
Multi-view和Singal-view的区别????????
multi-view就是说,在一个图数据中,每个节点具有属性信息,且属性值有多个。
4.解决方法:
研究如何将 GNN 用于多图结构的多视图数据,也就是使用Multi-view GNN模型;

(三)Multi-view GNN

1.定义:

2.研究现状:

3.存在挑战:
(1)他们无法为邻域中的不同节点分配可学习的指定不同权重;
(2)他们忽略了可以对节点属性和图结构进行重构以提高鲁棒性的问题;
(3)对于不同视图之间的一致性关系,没有明确考虑相似距离度量;
(4)现有的多视图 GNN 方法主要关注多图的情况,而忽略了同样重要的属性多样性,
4.解决方法:
本文的Motivation,即MAGCN。

(四)MAGCN

1.Key Contributions:
(1)提出了一种新颖的多视图属性图卷积网络,用于对多视图属性的图结构数据进行聚类;
(2)我们开发了具有注意机制的多视图属性图卷积编码器,以减少多视图图数据的噪声/冗余。 此外,考虑重建节点属性和图结构以提高鲁棒性;
(3)一致性嵌入编码器旨在通过探索不同视图的几何关系和概率分布一致性来提取多个视图之间的一致性信息。

四、Related Work

(一)Learning Graph node embedding with GNNS

1.基本思想:
Existing GNNs models in processing graph-structured data belong to a set ofgraph message-passing architecturesthat use differentaggregation schemesfor a node toaggregate feature messages from its neighborsin the graph.
2.代表模型:
(1)Graph Convolutional Networks(GCN) scale linearly in the number of graph edges and learn hidden layer representations that encode both local graph structure and features of nodes;
(2)Graph Attention Networks(GAT) enable specifying different weights to different nodes in neighborhood, by stacking self-attention layers in which nodes are able to attend over their neighborhoods’ features;
(3)Graph SAGEconcatenates the node’s feature with persified pooled neighborhood information and effectively trades off performance and runtime by sampling node neighborhoods;
(4)Message Passing Neural Networksfurther incorporate edge information when doing the aggregation.

(二) Multi-view Graph node embedding

1.研究现状:
(1)Spatiotemporal multi-graph convolution network, encodes the non-Euclidean correlations among regions using multiple graphs and explicitly captures them using multi graph convolution encoder;
(2) In application accounting for social networks,Multi-GCNincorporates non-redundant information from multiple views into the learning process;
(3) [Ma et al., 2018] utilizemultiview graph auto-encoder, which integrates heterogeneous, noisy, nonlinear-related information to learn accurate similarity measures especially when labels are scarce;

2.存在的挑战:
(1)Those multi-view GNNs cannot allocate learnable specifying different weights to different nodes in neighborhood;
(2) The clustering performance of them is limited as they do not consider the structure and distribution consistency for clustering embedding;
(3)Existing multi-view GNNs mainly focus on the case of multiple graphs and neglect the equally important attribute persity;

五、Proposed Methodology

(一)Natation

1.图表示:G%3D%28V%2CE%29%28G%20%5Cin%20R%5E%7Bn%20%5Ctimes%20n%7D%29
其中V%3D%28v_1%2Cv_2%2C%5Cdots%2Cv_n%29表示节点的集合,E表示边的集合;

2.图中节点的m-th view attribute feature表示为:
X_m%3D%28x_m%5E1%2Cx_m%5E2%2C%5Cdots%2Cx_m%5En%29%28X_m%20%5Cin%20R%5E%7Bn%20%5Ctimes%20d_m%7D%2Cm%3D1%2C2%2C%5Cdots%2CM%29,其中x_m%5Ei表示节点v_i的特征向量,M表示the number of views;

(二)The Framework of MAGCN

整个想法:
(1)First,encode multi-view graph dataX_minto graph embeddingH_m%20%3D%5C%7Bh%5E1_m%20%2C...%2Ch%5Ei_m%20%2C...%2Ch%5En_m%20%5C%7D%28H_m%E2%88%88%20R%5E%7Bn%2Ad%7D%29by multi-view attribute graph convolution encoders.
(2)Then,fedH_minto consistent embedding encoders and obtain a consistent clustering embeddingZ.
(3)The clustering process is eventually conducted on the ideal embedding intrinsic description space which is computed byZ;

(三)Multi-view Attribute Graph Convolution Encoder

1.Encoding
  1. The first pathway encoders map multi-view node attribute matrix and graph structure into graph embedding space. Specifically,
    for the m-th view
    , It maps graphG and m-th view attributes X_m to d-dimensional graph embedding features H_m by the following
    GCN encoder model

    H_m%5El%3D%5Csigma%28D%5E%7B-1/2%7DG%27D%5E%7B-1/2%7DH_m%5E%7Bl-1%7DW%5El%29 ,where G%27%3DG%2BI_N is the
    relevance coefficient matrix
    with added self-connection. As forH_m%5El , when l%20%3D%200 , H_m%5E0 is the initial m-th view attribute matrix X_m and when l%20%3D%20L , H%5EL_m is the final graph embedding feature representation H_m .

2.To determine the relevance between nodes and their neighbors, we use a attention mechanism with shared parameters among nodes。In thel-thmulti-view encoder layer, the learnable relevance matrixSis defined as :
S%3D%5Cvarphi%20%28G%20%5Codot%20t%5El_s%20H%5El_m%20W%5El%20%2BG%20%5Codot%20t%5El_r%20H%5El_m%20W%5El%29,wheret%5El_sandt%5El_r%20%5Cin%20R%5E%7B1%2Ad_l%7Drepresent the trainable parameters related to their own nodes and neighbor nodes, respectively.%5Codotrefers to theelement-wise multiplicationwith broadcasting capability. We normalizeSto get the final relevance coefficientG, soG_ijis computed by:
G_ij%3D%5Ccfrac%7Bexp%28S_%7Bij%7D%29%7D%7B%5Csum%20_%7Bk%20%5Cin%20N_i%7D%28S_%7Bik%7D%29%7D, whereN_iis the set of all nodes adjacent to nodei.

  1. We consider theH_m preserves basically all information about multi-view node attribute matrix X and graph structure G .
2.Decoding
  1. In the decoding process, we use
    the same number of layers as encoders for decoders
    , and each decoder layer tries to reverse its corresponding encoder layer. In other words, the decoding process is the inverse of the encoding process;
  2. The final decoded output is reconstructed node attribute matrix%5Chat%7BX%7D_m and the reconstructed graph structure %5Chat%7BG%7D_m , m%3D1%2C2%2C%5Cdots%2CV , the GCN decoder model is :
    %5Chat%7BH%7D%7B_m%5E%7B%28l-1%29%7D%7D%3D%5Csigma%20%28%5Chat%7BD%7D%5E%7B-1/2%7D%20%5Chat%7BG%27%7D%20%5Chat%7BD%7D%5E%7B-1/2%7D%20%5Chat%7BH%7D%7B_m%5E%7B%28l%29%7D%7D%20%5Chat%7BW%7D%7B%5E%7B%28l%29%7D%7D%29 , so %5Chat%7BX%7D%7B_m%7D%3D%5Chat%7BH%7D%7B_m%5E%7B%280%29%7D%7D , otherwise, %5Chat%7BG%7D%7B_m%5E%7Bij%7D%7D
    is implemented by an
    inner product decoder
    ofh_m%5Ei and h_m%5Ej , specifically,
    %5Chat%7BG%7D%7B_m%5E%7Bij%7D%7D%3D%5Cphi%28%7B-h_m%5Ei%7D%5ET%7Bh_m%5Ej%7D%29 , where %5Cphi%28%29 is the inner product operator.
3. The reconstruction lossL_%7Bre%7D

The reconstruction lossL_%7Bre%7Dof reconstructed multi-view node attribute matri%5Chat%7BX%7Dand reconstructed graph structure%5Chat%7BG%7Dcan be computed by following:
L_%7Bre%7D%3D%5Cmin_%5Ctheta%5Csum_%7Bi%3D1%7D%5EM%7C%7CX_i-%5Chat%7BX%7D_i%7C%7C_F%5E2%2B%5Clambda_1%7C%7CG_i-%5Chat%7BG%7D_i%7C%7C_F%5E2

(四)Consistent Embedding Encoders

1.Geometric relationship consistency

1.H_mis mapped into low-dimensional spaceZ_m,Z_mcontains almost all the original information so that it is not suitable for multi-view integrating directly. Then, we use consistent clustering layer to learn a common clustering embeddingZwhich is adaptively integrated by all theZ_m.

2.AssumeZ_mandZ_bare the low-dimensional space feature matrices of viewmandbobtained from consistent embedding encoders。Then we can use them to compute a geometric relationship similarity score assi%28Z_m%20%2CZ_b%20%29, wheresi%28%C2%B7%29is a similarity function.si%28%C2%B7%29can be measured by the Manhattan Distance, Euclidean distance, cosine similarity, etc。So the loss function ofgeometric relationship consistencyL_%7Bgeo%7Dis:
L_%7Bgeo%7D%3D%5Cmin_%5Ceta%5Csum_%7Bi%20%5Cneq%20j%7D%5E%7BM%7D%7C%7CZ_i-Z_j%7C%7C_F%5E2

2.The consistency of the probability distribution

1.The auxiliary distributionPofZwith trade-off parameters%5Crhoas follows:
L_%7Bpro%7D%3D%5Cmin_%5Ceta%20%5Csum_%7Bm%3D1%7D%5E%7BM%7D%5Crho_m%7C%7CQ_m-P%7C%7C_F%5E2
2.计算细节:
论文学习--Multi-View Attribute Graph Convolution Networks for Clustering(MAGCN)

(五)Task for Clustering

1.The total loss function of the proposed MAGCN is eventully formulated as:
L%3D%5Cmin_%7Bg%2Cc%2C%5Cbold%7BP%7D%7D%20L_%7Bre%7D%2B%5Clambda_2%20L_%7Bgeo%7D%2B%5Clambda_3%20L_%7Bpro%7D

2.Then we predict the cluster of each node from auxiliary distributionP。For nodei, its cluster can be calculated byp_i, in which the index with the highest probability value is the i’s cluster。Hence we could obtain the cluster label of nodeias:
y_i%3D%5Cargmax_k%7Bp_%7Bik%7D%7D

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
乘风的头像乘风管理团队
上一篇 2022年3月18日 下午6:23
下一篇 2022年3月18日 下午6:52

相关推荐