
Graph Transformer GANs for Graph-Constrained House Generation
Hao Tang
1
Zhenyu Zhang
2
Humphrey Shi
3
Bo Li
2
Ling Shao
4
Nicu Sebe
5
Radu Timofte
1,6
Luc Van Gool
1,7
1
CVL, ETH Zurich
2
Tencent Youtu Lab
3
U of Oregon & UIUC & Picsart AI Research
4
UCAS-Terminus AI Lab, UCAS
5
University of Trento
6
University of Wurzburg
7
KU Leuven
Abstract
We present a novel graph Transformer generative adver-
sarial network (GTGAN) to learn effective graph node re-
lations in an end-to-end fashion for the challenging graph-
constrained house generation task. The proposed graph-
Transformer-based generator includes a novel graph Trans-
former encoder that combines graph convolutions and self-
attentions in a Transformer to model both local and global
interactions across connected and non-connected graph
nodes. Specifically, the proposed connected node atten-
tion (CNA) and non-connected node attention (NNA) aim
to capture the global relations across connected nodes and
non-connected nodes in the input graph, respectively. The
proposed graph modeling block (GMB) aims to exploit local
vertex interactions based on a house layout topology. More-
over, we propose a new node classification-based discrim-
inator to preserve the high-level semantic and discrimina-
tive node features for different house components. Finally,
we propose a novel graph-based cycle-consistency loss that
aims at maintaining the relative spatial relationships be-
tween ground truth and predicted graphs. Experiments on
two challenging graph-constrained house generation tasks
(i.e., house layout and roof generation) with two public
datasets demonstrate the effectiveness of GTGAN in terms
of objective quantitative scores and subjective visual real-
ism. New state-of-the-art results are established by large
margins on both tasks.
1. Introduction
This paper focuses on converting an input graph to a re-
alistic house footprint, as depicted in Figure 1. Existing
house generation methods such as [2, 15, 18, 26, 30, 43, 45],
typically rely on building convolutional layers. However,
convolutional architectures lack an understanding of long-
range dependencies in the input graph since inherent in-
ductive biases exist. Several Transformer architectures
[3, 6, 11, 16, 17, 22, 41, 42, 44, 52, 53] based on the self-
attention mechanism have recently been proposed to encode
long-range or global relations, thus learn highly expressive
feature representations. On the other hand, graph convolu-
tion networks are good at exploiting local and neighborhood
vertex correlations based on a graph topology. Therefore,
it stands to reason to combine graph convolution networks
and Transformers to model local as well as global interac-
tions for solving graph-constrained house generation.
In addition, the proposed discriminator aims to distin-
guish real and fake house layouts, which ensures that our
generated house layouts or roofs look realistic. At the same
time, the discriminator classifies the generated rooms to
their corresponding real labels, preserving the discrimina-
tive and semantic features (e.g., size and position) for differ-
ent house components. To maintain the graph-level layout,
we also propose a novel graph-based cycle-consistency loss
to preserve the relative spatial relationships between ground
To this end, we propose a novel graph Transformer gen-
erative adversarial network (GTGAN), which consists of
two main novel components, i.e., a graph Transformer-
based generator and a node classification-based discrimi-
nator (see Figure 1). The proposed generator aims to gen-
erate a realistic house from the input graph, which consists
of three components, i.e., a convolutional message passing
neural network (Conv-MPN), a graph Transformer encoder
(GTE), and a generation head. Specifically, Conv-MPN first
receives graph nodes as inputs and aims to extract discrim-
inative node features. Next, the embedded nodes are fed to
GTE, in which the long-range and global relation reasoning
is performed by the connected node attention (CNA) and
non-connected node attention (NNA) modules. Then, the
output from both attention modules is fed to the proposed
graph modeling block (GMB) to capture local and neigh-
borhood relationships based on a house layout topology. Fi-
nally, the output of GTE is fed to the generative head to pro-
duce the corresponding house layout or roof. To the best of
our knowledge, we are the first to use a graph Transformer
to model local and global relations across graph nodes for
solving graph-constrained house generation.
truth and predicted graphs.
Overall, our contributions are summarized as follows:
arXiv:2303.08225v1 [cs.CV] 14 Mar 2023
评论