论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

论文信息

论文标题:SCGC : Self-Supervised Contrastive Graph Clustering
论文作者:Gayan K.
Kulatilleke, Marius Portmann, Shekhar S. Chandra
论文来源:2022, arXiv
论文地址:download
论文代码:download

转自本人博客《论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering

1 Introduction

为了利用结构中存在的丰富信息,许多研究人员将图神经网络(GNN)变体与AEs结合起来。这些方法存在的问题是:

  • over-smoothing
  • noisy neighbours (heterophily)
  • the suspended animation problem

贡献:

  • 提出一种有效的新型深度聚类模型 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,消除了在整个学习层中携带结构/边缘信息的必要性;
  • 提出了一种新的影响增强对比(IAC)损失来合并图结构,它可以有效地将任何模型转换为图感知; 提出了论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,一种利用预学习质心的更精简的变体,它使用了一半的原始参数,显著提高了训练和推理速度;

2 Method

整体框架如 Figure 1 所示:

2.1 Graph structure by contrastive loss

对比损失使正或连通节点更近,负或不连通节点在特征空间中更远。基于此思想,将拓扑结构信息合并到嵌入中。

2.1.1 Influence Augmented Contrastive (IAC) loss.

考虑不同深度的节点之间的影响(相连或不相连),并考虑不可加性以及可加性。对于可加性,对于一个给定的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 深度,总影响定义为:
    论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》
    
  其中,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 为深度 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 处节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 之间关系的系数。

先前的GNN模型假设了一个固定的深度关系,即通过超参数搜索获得的单个 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》。然而,该层结构需要与潜在的影响结构保持一致。此外,GNN层普遍适用于所有节点,因此错误地假设了所有节点从不同的深度中获得相同的影响。因此,GNN模型往往不是最优的。此外,由于过度平滑,大多数GNN模型的深度不能超过 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 层,如 FDGATII [14] 和 GCNII[4] 试图解决这些模型。

给定 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,我们将 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 节点的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 损失表示为:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

其中,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 为温度参数,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 为节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 之间连接的影响。

对于每个节点,它的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 邻域内的节点被认为是正样本,将其与所有节点进行对比。论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 损失鼓励有影响的节点在嵌入空间中比无影响的节点更近。接下来,概述了如何计算累积影响。

2.1.2 Determining Influence

由于真实世界的图通常非常稀疏,邻接矩阵 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 中的大部分项都是零。然而,在两个节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 之间没有一条边并不意味着没有关联;仍然可以有很强的关联,即:高阶接近度。这种直觉促使我们利用图中的高阶关系,通常通过将 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 提高到 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 次幂 [3,10] 来实现的。论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的第 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 个条目给出了 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 长度的行走次数。

归一化邻接矩阵:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 次幂提供了节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》之间的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 跳关系的强度。

通过计算节点之间的累加关系作为节点关系强度,而不是限制在任意的第 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 跳邻域上。即,将归一化邻接矩阵的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 次累积幂定义为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,其中,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 包含了论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 中所有先前的邻域跳跃关系的聚合集。 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 只需要在训练之前计算一次,开销很少。另外,当节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 对节点 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 邻居产生非零影响时,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 才能得到非零值。

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

与我们在影响方面的工作不同,Graph-MLP 提出了基于余弦相似度的 NContrast (NC) 损失进行分类,其中每个节点只考虑 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 邻域,而不考虑更全面的加性影响。其 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的计算如下:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的对比损失定义为:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

2.2 Self supervised clustering

图聚类本质上是一项无监督的任务,没有反馈来指导优化过程。为此,使用概率分布导出的软标签作为聚类增强的自监督机制,有效地将聚类叠加到嵌入上。

首先获得软集群分配概率 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,嵌入 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 和簇中心 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,使用 student’s t -distribution 作为内核来衡量嵌入和质心之间的相似性,为处理不同的簇:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

其中,簇中心 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 由预先训练过的 AE 的嵌入上的 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 初始化,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 是 Student’s t-distribution 的自由度。使用 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 作为所有样本的聚类分配的分布,并在实验中设置 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

节点在 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 中具有更高的软分配概率,通过将 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 提高到二次幂并进行归一化,定义一个强调高置信度分配的目标分布论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,将其定义为:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

其中,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 为质心 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的软簇频率。

为了使数据表示更接近聚类中心,将 KL 散度损失用于 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 分布最小化,迫使当前分布 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 接近高置信度的目标分布 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》。通过使用分布 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 来实现目标分布 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 来自监督簇分配,然后通过最小化 KL 散度来依次监督分布 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,如下:
    论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

2.3 Initial centroids and embeddings

为了提取节点特征并获得初始嵌入 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 和聚类质心 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 进行优化,我们采用了基于AE的预训练阶段。首先,我们使用编码器-解码器通过最小化原始数据 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 和重建数据 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 重构损失来提取潜在嵌入 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,即:

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

2.4 Final proposed models

论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

2.5 Complexity Analysis

给定输入数据维数 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 和 AE 层的维数为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,第一个编码器层的权重矩阵大小为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》。当输入数据大小为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 时,论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的时间复杂度为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 的时间复杂度为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》。假设 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 聚类,从 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 可知,时间复杂度 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》

对于对比损失,我们计算了所有 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》。因此,时间复杂度为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,其中 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 是嵌入维数。虽然这导致了理论时间复杂度为 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》,但鉴于 论文解读(SCGC)《SCGC : Self-Supervised Contrastive Graph Clustering》 是对称的,我们只需要计算一半的结果,并且由于矩阵的转置是相同的、索引交换的矩阵,缓存可以产生超过两倍的影响。

3 Experiments

数据集

4 Conclusion

本文介绍了影响增强对比(IAC),这是一种新的图表示的自监督学习方法。它的可加性使更强的归纳偏差能够进行泛化,而不需要近似复杂图的潜在依赖关系和关系结构。使用IAC,我们的SCGC*比传统的基于GNN的方法提供了显著的优势;SCGC可以自然地适应本地、长期或任何节点依赖的混合组合,支持批处理,高效,具有线性推理的复杂性,并且可以简单地并行化。SCGC与先进技术相比取得了显著的改进(DBLP的NMIARI为18%,ACM训练时间减少了69%,训练时间减少了55%,推理时间减少了81%)。通过展示对比学习、聚类[2,27,29]和自动编码器模型[6,9,30]之间有效的图聚类的新可能性,我们希望为这些领域的模型改进提供灵感和指导,并解决本文概述的一些局限性。

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
心中带点小风骚的头像心中带点小风骚普通用户
上一篇 2022年5月21日
下一篇 2022年5月21日

相关推荐