
SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile Wentao Zhang et al.
Fixed weights
𝑣
!
𝒎
!
"
𝒎
!
#
𝒎
!
$
𝒎
!
"
⋯
1
#$
- step
2
%&
-step
3
'&
-step
𝐾
$(
-step
P
P
⋯
P P
Under-smooth
Over-smooth
=
1
𝑑
!
"%&
𝑑
'
&
𝑤
!'
Noise
Over-smoothed features
P
Propagation
𝑣
"
𝑣
!
𝑣
"
(a) Propagation with Over-smoothed Features
(b) Propagation with Fixed Weights
Figure 1: Example of two limitations in non-parametric GNNs.
they can not scale well to large industrial graphs, which hinders
their popularity and development in many real scenarios.
An intuitive idea to tackle the scalability issue is to apply more
advanced sampling techniques to existing GNNs, such as node-wise
sampling [
5
,
19
], layer-wise sampling [
4
,
23
] and graph-wise sam-
pling [
8
,
44
]. Although these sampling techniques can alleviate the
scalability issue to a certain extent, the computation and communi-
cation cost is still high due to the recursive propagation. Besides,
the sampling quality highly inuences the predictive performance
of GNNs, and it is hard for these methods to tackle the trade-o
between the training scalability and model performance.
Fortunately, some recent works attribute the success of GNNs to
the non-parametric feature propagation rather than the paramet-
ric non-linear transformation. For example, as shown in SGC [
37
],
the non-parametric propagated features with a simple logistic re-
gression model achieve comparable classication performance as
GCN [
24
], while maintaining high scalability. As a new way to
tackle the scalability issue, non-parametric GNNs have achieved
lots of concern recently, and most top solutions in Open Graph
Benchmark [
22
] are non-parametric GNNs. Following this direc-
tion, various non-parametric and scalable GNNs are proposed to fur-
ther improve the predictive performance of SGC, such as SIGN [
15
],
S
2
GC [
49
], GBP [
6
], GAMLP [
48
], and NAFS [
47
]. Despite their high
scalability and predictive performance, existing non-parametric
GNNs still face the following two limitations:
Propagation with Over-smoothed Features. Each node in
a
𝐾
-layer GNN can propagate its node feature
𝐾
-steps to distant
nodes, thus the node embedding can be smoothed and enhanced.
However, as shown in Figure 1, node information may be prop-
agated to the full graph with a large propagation depth, making
the node representation indistinguishable and leading to the bad
predictive performance, i.e., the over-smoothing issue [
26
]. As dis-
cussed in Eq. 5 in Section 3.1, if we propagate node features innite
times in a connected graph, the nal node embedding is only re-
lated to the node degrees. By combining the embedding of dierent
propagation steps, some recent non-parametric GNNs can slightly
alleviate the over-smoothing problem and get better predictive per-
formance with larger propagation depth. However, Figure 1 shows
that over-smoothed node embeddings (i.e., nodes with deep color)
are continually propagated to their neighbors in latter propagation
process. Such redundant and noisy information will be propagated
along the edges and harm the embedding of more nodes. To sum up,
existing methods cannot fundamentally solve this over-smoothing
issue due to the propagation of over-smoothed features.
Propagation with Fixed Weights. Existing non-parametric
GNNs are built on the assumption of graph homogeneity, i.e., ad-
jacent nodes are more likely to have the same label. However, the
assumption does not hold in many real graph datasets. As shown
in Figure 1 and introduced in Section 3.1, the propagation weight
between two nodes in current non-parametric GNNs is only related
to graph structures (e.g., the node degrees) and is xed in dierent
propagation steps. Therefore, existing non-parametric GNNs may
assign a large and xed propagation weight to two nodes with
dierent labels (i.e., nodes with dierent colors), making the nodes’
embedding with dierent labels indistinguishable. Besides the un-
certain graph homogeneity, nodes with the same label may have
dierent extent of homogeneity, such as the nodes near and far
from the decision boundary. Therefore, the weight should also be
adapted to their similarity.
In this paper, we propose non-parametric attention (NPA), a plug-
and-play module that is compatible with dierent non-parametric
GNNs, to get scalable and deep GNNs simultaneously. As the over-
smoothed features are global to capture the full graph information,
they will not bring too much new information to current node em-
bedding. Therefore, NPA introduces
global attention
to tackle
the propagation of the over-smoothed features. Specically, it as-
signs a node-adaptive attention weight to the newly propagated
features to measure the newly generated global information com-
pared with the previous feature. In this way, it can adaptively re-
move the over-smoothed and redundant feature information in each
propagation step. Besides, by theoretical analysis, assigning larger
weights to neighbors with more similar features leads to a more
easy-to-learn classier. To tackle the xed propagation weights
issue in the local feature propagation, NPA further proposes
local
attention
to consider the similarity of propagated features when
calculating the propagation weights and assigns more attention
(i.e., larger propagation weight) to similar nodes adaptively in each
propagation step. As introduced in Section 2.4, the key dierence of
existing non-parametric GNNs is how they combine dierent steps
of propagated features. Since the goal of NPA is to improve the
quality of dierent steps of propagated features, NPA is orthogonal
and compatible with all kinds of existing non-parametric GNNs.
Our main contributions are as follows:
•
(1)
New perspective
. To the best of our knowledge, this is the
rst work to explore the two limitations of existing non-
parametric GNNs, i.e., the propagation with over-smoothed
features and xed weights; this new nding opens up a new
direction towards scalable and deep GNNs.
•
(2)
New method
. Based on our new ndings, we propose NPA,
a non-parametric attention method to tackle the above limi-
tations. As a plug-and-play module, NPA is the rst attempt
to improve the propagation process of non-parametric GNNs,
and it is compatible with existing non-parametric GNNs, to
support scalable and deep GNNs simultaneously.
•
(3)
Industrial Deployment.
We have successfully implemented
and deployed NPA in a real-world production environment
评论