暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SIGMOD 2024_Improving Large-scale Graph Neural Networks with Non-parametric Attention_腾讯云.pdf
88
14页
0次
2025-05-28
免费下载
NPA: Improving Large-scale Graph Neural Networks with
Non-parametric Aention
Wentao Zhang
Center for Machine Learning
Research, Peking University
Haidian, Beijing, China
wentao.zhang@pku.edu.cn
Guochen Yan
School of Computer Science, Peking
University
Haidian, Beijing, China
Guochen_Yan@outlook.com
Yu Shen
School of Computer Science, Peking
University
Haidian, Beijing, China
shenyu@pku.edu.cn
Yang Ling
School of Computer Science, Peking
University
Haidian, Beijing, China
yangling0818@163.com
Yaoyu Tao
Tencent Inc.
Haidian, Beijing, China
brucetao@tencent.com
Bin Cui
School of Computer Science, Peking
University
Haidian, Beijing, China
bin.cui@pku.edu.cn
Jian Tang
Mila - Québec AI Institute, HEC
Montréal
Montréal, Canada
tangjian@mila.quebec
ABSTRACT
Recent works show great interest in designing Graph Neural Net-
works (GNNs) that scale to large graphs. While previous work
focuses on designing advanced sampling techniques for existing
GNNs, the design of non-parametric GNNs, an orthogonal direction
for scalable performance, has aroused lots of concerns recently. For
example, nearly all top solutions in the Open Graph Benchmark
1
leaderboard are non-parametric GNNs. Despite their high predic-
tive performance and scalability, non-parametric GNNs still face
two limitations. First, due to the propagation of over-smoothed
features, they suer from severe performance degradation along
with the propagation depth. More importantly, they only consider
the graph structure and ignore the feature inuence during the
non-parametric propagation, leading to sub-optimal propagated
features. To address these limitations, we present non-parametric
attention (NPA), a plug-and-play module that is compatible with
non-parametric GNNs, to get scalable and deep GNNs simultane-
ously. We have deployed NPA in Tencent with the Angel platform,
and we further evaluate NPA on both real-world datasets and large-
scale industrial datasets. Experimental results on seven homophilic
graphs (including the industrial Tencent Video graph) and ve
heterophilic graphs demonstrate NPA enjoys high performance
Both authors contributed equally to this research.
1
https://ogb.stanford.edu/docs/leader_nodeprop
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0422-2/24/06
https://doi.org/10.1145/3626246.3653399
achieves large performance gain over existing non-parametric
GNNs, deeper architecture – improves non-parametric GNNs with
large model depth, and high scalability – can support large-scale
graphs with low time costs. Notably, it achieves state-of-the-art
performance on the large ogbn-papers100M dataset.
CCS CONCEPTS
Information systems
Data mining; Computing method-
ologies Machine learning.
KEYWORDS
graph neural networks, large-scale graph, non-parametric
ACM Reference Format:
Wentao Zhang, Guochen Yan, Yu Shen, Yang Ling, Yaoyu Tao, Bin Cui,
and Jian Tang. 2024. NPA: Improving Large-scale Graph Neural Networks
with Non-parametric Attention. In Companion of the 2024 International
Conference on Management of Data (SIGMOD-Companion ’24), June 9–15,
2024, Santiago, AA, Chile. ACM, New York, NY, USA, 14 pages. https://doi.
org/10.1145/3626246.3653399
1 INTRODUCTION
Graph Neural Networks (GNNs) have achieved great success in
many graph-based applications in recent years, such as natural
language processing [
13
,
29
,
38
], computer vision [
1
,
30
,
33
], recom-
mendation system [
17
,
39
,
40
], biology [
12
,
18
,
20
], and relational
databases [
3
,
11
,
36
]. By recursively propagating the node embed-
ding along edges, GNNs can enhance the representation of each
node with its distant neighbors, thus improving the model perfor-
mance. However, most existing GNNs have to repeatedly perform
the computationally expensive and recursive feature propagation
with the participation of the entire graph at each training epoch,
leading to high computation costs in a single machine and high
communication costs in distributed settings [
27
,
46
]. Therefore,
414
SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile Wentao Zhang et al.
Fixed weights
𝑣
!
𝒎
!
"
𝒎
!
#
𝒎
!
$
𝒎
!
"
1
#$
- step
2
%&
-step
3
'&
-step
𝐾
$(
-step
P
P
P P
Under-smooth
Over-smooth
=
1
𝑑
!
"%&
𝑑
'
&
𝑤
!'
Noise
Over-smoothed features
P
Propagation
𝑣
"
𝑣
!
𝑣
"
(a) Propagation with Over-smoothed Features
(b) Propagation with Fixed Weights
Figure 1: Example of two limitations in non-parametric GNNs.
they can not scale well to large industrial graphs, which hinders
their popularity and development in many real scenarios.
An intuitive idea to tackle the scalability issue is to apply more
advanced sampling techniques to existing GNNs, such as node-wise
sampling [
5
,
19
], layer-wise sampling [
4
,
23
] and graph-wise sam-
pling [
8
,
44
]. Although these sampling techniques can alleviate the
scalability issue to a certain extent, the computation and communi-
cation cost is still high due to the recursive propagation. Besides,
the sampling quality highly inuences the predictive performance
of GNNs, and it is hard for these methods to tackle the trade-o
between the training scalability and model performance.
Fortunately, some recent works attribute the success of GNNs to
the non-parametric feature propagation rather than the paramet-
ric non-linear transformation. For example, as shown in SGC [
37
],
the non-parametric propagated features with a simple logistic re-
gression model achieve comparable classication performance as
GCN [
24
], while maintaining high scalability. As a new way to
tackle the scalability issue, non-parametric GNNs have achieved
lots of concern recently, and most top solutions in Open Graph
Benchmark [
22
] are non-parametric GNNs. Following this direc-
tion, various non-parametric and scalable GNNs are proposed to fur-
ther improve the predictive performance of SGC, such as SIGN [
15
],
S
2
GC [
49
], GBP [
6
], GAMLP [
48
], and NAFS [
47
]. Despite their high
scalability and predictive performance, existing non-parametric
GNNs still face the following two limitations:
Propagation with Over-smoothed Features. Each node in
a
𝐾
-layer GNN can propagate its node feature
𝐾
-steps to distant
nodes, thus the node embedding can be smoothed and enhanced.
However, as shown in Figure 1, node information may be prop-
agated to the full graph with a large propagation depth, making
the node representation indistinguishable and leading to the bad
predictive performance, i.e., the over-smoothing issue [
26
]. As dis-
cussed in Eq. 5 in Section 3.1, if we propagate node features innite
times in a connected graph, the nal node embedding is only re-
lated to the node degrees. By combining the embedding of dierent
propagation steps, some recent non-parametric GNNs can slightly
alleviate the over-smoothing problem and get better predictive per-
formance with larger propagation depth. However, Figure 1 shows
that over-smoothed node embeddings (i.e., nodes with deep color)
are continually propagated to their neighbors in latter propagation
process. Such redundant and noisy information will be propagated
along the edges and harm the embedding of more nodes. To sum up,
existing methods cannot fundamentally solve this over-smoothing
issue due to the propagation of over-smoothed features.
Propagation with Fixed Weights. Existing non-parametric
GNNs are built on the assumption of graph homogeneity, i.e., ad-
jacent nodes are more likely to have the same label. However, the
assumption does not hold in many real graph datasets. As shown
in Figure 1 and introduced in Section 3.1, the propagation weight
between two nodes in current non-parametric GNNs is only related
to graph structures (e.g., the node degrees) and is xed in dierent
propagation steps. Therefore, existing non-parametric GNNs may
assign a large and xed propagation weight to two nodes with
dierent labels (i.e., nodes with dierent colors), making the nodes’
embedding with dierent labels indistinguishable. Besides the un-
certain graph homogeneity, nodes with the same label may have
dierent extent of homogeneity, such as the nodes near and far
from the decision boundary. Therefore, the weight should also be
adapted to their similarity.
In this paper, we propose non-parametric attention (NPA), a plug-
and-play module that is compatible with dierent non-parametric
GNNs, to get scalable and deep GNNs simultaneously. As the over-
smoothed features are global to capture the full graph information,
they will not bring too much new information to current node em-
bedding. Therefore, NPA introduces
global attention
to tackle
the propagation of the over-smoothed features. Specically, it as-
signs a node-adaptive attention weight to the newly propagated
features to measure the newly generated global information com-
pared with the previous feature. In this way, it can adaptively re-
move the over-smoothed and redundant feature information in each
propagation step. Besides, by theoretical analysis, assigning larger
weights to neighbors with more similar features leads to a more
easy-to-learn classier. To tackle the xed propagation weights
issue in the local feature propagation, NPA further proposes
local
attention
to consider the similarity of propagated features when
calculating the propagation weights and assigns more attention
(i.e., larger propagation weight) to similar nodes adaptively in each
propagation step. As introduced in Section 2.4, the key dierence of
existing non-parametric GNNs is how they combine dierent steps
of propagated features. Since the goal of NPA is to improve the
quality of dierent steps of propagated features, NPA is orthogonal
and compatible with all kinds of existing non-parametric GNNs.
Our main contributions are as follows:
(1)
New perspective
. To the best of our knowledge, this is the
rst work to explore the two limitations of existing non-
parametric GNNs, i.e., the propagation with over-smoothed
features and xed weights; this new nding opens up a new
direction towards scalable and deep GNNs.
(2)
New method
. Based on our new ndings, we propose NPA,
a non-parametric attention method to tackle the above limi-
tations. As a plug-and-play module, NPA is the rst attempt
to improve the propagation process of non-parametric GNNs,
and it is compatible with existing non-parametric GNNs, to
support scalable and deep GNNs simultaneously.
(3)
Industrial Deployment.
We have successfully implemented
and deployed NPA in a real-world production environment
415
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜