SIGMOD 2024_Improving Large-scale Graph Neural Networks with Non-parametric Attention_腾讯云.pdf

迹部景吾

14页

0次

2025-05-28

免费下载

NPA: Improving Large-scale Graph Neural Networks with

Non-parametric Aention

Wentao Zhang

∗

Center for Machine Learning

Research, Peking University

Haidian, Beijing, China

wentao.zhang@pku.edu.cn

Guochen Yan

∗

School of Computer Science, Peking

University

Haidian, Beijing, China

Guochen_Yan@outlook.com

Yu Shen

School of Computer Science, Peking

University

Haidian, Beijing, China

shenyu@pku.edu.cn

Yang Ling

School of Computer Science, Peking

University

Haidian, Beijing, China

yangling0818@163.com

Yaoyu Tao

Tencent Inc.

Haidian, Beijing, China

brucetao@tencent.com

Bin Cui

School of Computer Science, Peking

University

Haidian, Beijing, China

bin.cui@pku.edu.cn

Jian Tang

Mila - Québec AI Institute, HEC

Montréal

Montréal, Canada

tangjian@mila.quebec

ABSTRACT

Recent works show great interest in designing Graph Neural Net-

works (GNNs) that scale to large graphs. While previous work

focuses on designing advanced sampling techniques for existing

GNNs, the design of non-parametric GNNs, an orthogonal direction

for scalable performance, has aroused lots of concerns recently. For

example, nearly all top solutions in the Open Graph Benchmark

leaderboard are non-parametric GNNs. Despite their high predic-

tive performance and scalability, non-parametric GNNs still face

two limitations. First, due to the propagation of over-smoothed

features, they suer from severe performance degradation along

with the propagation depth. More importantly, they only consider

the graph structure and ignore the feature inuence during the

non-parametric propagation, leading to sub-optimal propagated

features. To address these limitations, we present non-parametric

attention (NPA), a plug-and-play module that is compatible with

non-parametric GNNs, to get scalable and deep GNNs simultane-

ously. We have deployed NPA in Tencent with the Angel platform,

and we further evaluate NPA on both real-world datasets and large-

scale industrial datasets. Experimental results on seven homophilic

graphs (including the industrial Tencent Video graph) and ve

heterophilic graphs demonstrate NPA enjoys high performance

∗

Both authors contributed equally to this research.

https://ogb.stanford.edu/docs/leader_nodeprop

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile

ACM ISBN 979-8-4007-0422-2/24/06

https://doi.org/10.1145/3626246.3653399

– achieves large performance gain over existing non-parametric

GNNs, deeper architecture – improves non-parametric GNNs with

large model depth, and high scalability – can support large-scale

graphs with low time costs. Notably, it achieves state-of-the-art

performance on the large ogbn-papers100M dataset.

CCS CONCEPTS

• Information systems

→

Data mining; • Computing method-

ologies → Machine learning.

KEYWORDS

graph neural networks, large-scale graph, non-parametric

ACM Reference Format:

Wentao Zhang, Guochen Yan, Yu Shen, Yang Ling, Yaoyu Tao, Bin Cui,

and Jian Tang. 2024. NPA: Improving Large-scale Graph Neural Networks

with Non-parametric Attention. In Companion of the 2024 International

Conference on Management of Data (SIGMOD-Companion ’24), June 9–15,

2024, Santiago, AA, Chile. ACM, New York, NY, USA, 14 pages. https://doi.

org/10.1145/3626246.3653399

1 INTRODUCTION

Graph Neural Networks (GNNs) have achieved great success in

many graph-based applications in recent years, such as natural

language processing [

], computer vision [

], recom-

mendation system [

], biology [

], and relational

databases [

]. By recursively propagating the node embed-

ding along edges, GNNs can enhance the representation of each

node with its distant neighbors, thus improving the model perfor-

mance. However, most existing GNNs have to repeatedly perform

the computationally expensive and recursive feature propagation

with the participation of the entire graph at each training epoch,

leading to high computation costs in a single machine and high

communication costs in distributed settings [

]. Therefore,

414

SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile Wentao Zhang et al.

Fixed weights

𝑣

𝒎

⋯

- step

-step

𝐾

-step

⋯

P P

Under-smooth

Over-smooth

𝑑

"%&

𝑑

𝑤

Noise

Over-smoothed features

Propagation

𝑣

(a) Propagation with Over-smoothed Features

(b) Propagation with Fixed Weights

Figure 1: Example of two limitations in non-parametric GNNs.

they can not scale well to large industrial graphs, which hinders

their popularity and development in many real scenarios.

An intuitive idea to tackle the scalability issue is to apply more

advanced sampling techniques to existing GNNs, such as node-wise

sampling [

], layer-wise sampling [

] and graph-wise sam-

pling [

]. Although these sampling techniques can alleviate the

scalability issue to a certain extent, the computation and communi-

cation cost is still high due to the recursive propagation. Besides,

the sampling quality highly inuences the predictive performance

of GNNs, and it is hard for these methods to tackle the trade-o

between the training scalability and model performance.

Fortunately, some recent works attribute the success of GNNs to

the non-parametric feature propagation rather than the paramet-

ric non-linear transformation. For example, as shown in SGC [

the non-parametric propagated features with a simple logistic re-

gression model achieve comparable classication performance as

GCN [

], while maintaining high scalability. As a new way to

tackle the scalability issue, non-parametric GNNs have achieved

lots of concern recently, and most top solutions in Open Graph

Benchmark [

] are non-parametric GNNs. Following this direc-

tion, various non-parametric and scalable GNNs are proposed to fur-

ther improve the predictive performance of SGC, such as SIGN [

GC [

], GBP [

], GAMLP [

], and NAFS [

]. Despite their high

scalability and predictive performance, existing non-parametric

GNNs still face the following two limitations:

Propagation with Over-smoothed Features. Each node in

𝐾

-layer GNN can propagate its node feature

𝐾

-steps to distant

nodes, thus the node embedding can be smoothed and enhanced.

However, as shown in Figure 1, node information may be prop-

agated to the full graph with a large propagation depth, making

the node representation indistinguishable and leading to the bad

predictive performance, i.e., the over-smoothing issue [

]. As dis-

cussed in Eq. 5 in Section 3.1, if we propagate node features innite

times in a connected graph, the nal node embedding is only re-

lated to the node degrees. By combining the embedding of dierent

propagation steps, some recent non-parametric GNNs can slightly

alleviate the over-smoothing problem and get better predictive per-

formance with larger propagation depth. However, Figure 1 shows

that over-smoothed node embeddings (i.e., nodes with deep color)

are continually propagated to their neighbors in latter propagation

process. Such redundant and noisy information will be propagated

along the edges and harm the embedding of more nodes. To sum up,

existing methods cannot fundamentally solve this over-smoothing

issue due to the propagation of over-smoothed features.

Propagation with Fixed Weights. Existing non-parametric

GNNs are built on the assumption of graph homogeneity, i.e., ad-

jacent nodes are more likely to have the same label. However, the

assumption does not hold in many real graph datasets. As shown

in Figure 1 and introduced in Section 3.1, the propagation weight

between two nodes in current non-parametric GNNs is only related

to graph structures (e.g., the node degrees) and is xed in dierent

propagation steps. Therefore, existing non-parametric GNNs may

assign a large and xed propagation weight to two nodes with

dierent labels (i.e., nodes with dierent colors), making the nodes’

embedding with dierent labels indistinguishable. Besides the un-

certain graph homogeneity, nodes with the same label may have

dierent extent of homogeneity, such as the nodes near and far

from the decision boundary. Therefore, the weight should also be

adapted to their similarity.

In this paper, we propose non-parametric attention (NPA), a plug-

and-play module that is compatible with dierent non-parametric

GNNs, to get scalable and deep GNNs simultaneously. As the over-

smoothed features are global to capture the full graph information,

they will not bring too much new information to current node em-

bedding. Therefore, NPA introduces

global attention

to tackle

the propagation of the over-smoothed features. Specically, it as-

signs a node-adaptive attention weight to the newly propagated

features to measure the newly generated global information com-

pared with the previous feature. In this way, it can adaptively re-

move the over-smoothed and redundant feature information in each

propagation step. Besides, by theoretical analysis, assigning larger

weights to neighbors with more similar features leads to a more

easy-to-learn classier. To tackle the xed propagation weights

issue in the local feature propagation, NPA further proposes

local

attention

to consider the similarity of propagated features when

calculating the propagation weights and assigns more attention

(i.e., larger propagation weight) to similar nodes adaptively in each

propagation step. As introduced in Section 2.4, the key dierence of

existing non-parametric GNNs is how they combine dierent steps

of propagated features. Since the goal of NPA is to improve the

quality of dierent steps of propagated features, NPA is orthogonal

and compatible with all kinds of existing non-parametric GNNs.

Our main contributions are as follows:

•

(1)

New perspective

. To the best of our knowledge, this is the

rst work to explore the two limitations of existing non-

parametric GNNs, i.e., the propagation with over-smoothed

features and xed weights; this new nding opens up a new

direction towards scalable and deep GNNs.

•

(2)

New method

. Based on our new ndings, we propose NPA,

a non-parametric attention method to tackle the above limi-

tations. As a plug-and-play module, NPA is the rst attempt

to improve the propagation process of non-parametric GNNs,

and it is compatible with existing non-parametric GNNs, to

support scalable and deep GNNs simultaneously.

•

(3)

Industrial Deployment.

We have successfully implemented

and deployed NPA in a real-world production environment

415

of 14

免费下载

关注

评论