
For example, a wall is usually axis-aligned, where the coor-
dinate values of adjacent corners are exactly equal. A wall
might be shared with adjacent rooms further. Direct regres-
sion of 2D coordinates would never achieve these relation-
ships. One could use a discrete representation such as one
hot encoding over possible coordinate values with classifi-
cation, but this causes a label imbalance (i.e., most values
are 0 in the encoding) and fails the network training.
This paper presents a novel approach for graph-
constrained floorplan generation that directly gener-
ates a vector-graphics floorplan (i.e., without any post-
processing), handles non-Manhattan architectures, and
makes significant improvements on all the metrics. Con-
cretely, a bubble-diagram is given as a graph, whose nodes
are rooms and edges are the door-connections. We represent
a floorplan as a set of 1D polygonal loops, each of which
corresponds to a room or a door, then generate 2D coordi-
nates of room/door corners (See Fig. 1). The key idea is the
use of a Diffusion Model (DM) with a careful design in the
denoising targets. Our approach infers 1) a single-step noise
amount as a continuous quantity to precisely invert the con-
tinuous forward process; and 2) the final 2D coordinate as
the discrete quantity to establish incident relationships. The
discrete representation after the denoising iterations is the
final floorplan model.
Qualitative and quantitative evaluations show that the
proposed system outperforms the existing state-of-the-art,
House-GAN++ [34], with significant margins, while being
end-to-end and capable of generating non-Manhattan floor-
plans with exact control on the number of corners per room.
We will share all our code and models.
2. Related Work
Floorplan generation: Generation of 3D buildings and
floorplans has been an active area of research from a pre-
deep learning era [4, 12, 31, 32, 37]. The research area
has further flourished with the emergence of deep learning.
Nauata et al. [33] proposed House-GAN as a graph con-
strained floorplan generative model via Generative Adver-
sarial Network [11]. House-GAN generates segmentation
masks of different rooms and combines them to a single
floorplan. The authors further improved the quality of the
generation by House-GAN++ [34], which iteratively refines
a layout. Given the boundary of a floorplan, Upadhyay et
al. [44] used the embedded input boundary as an additional
input feature to predict a floorplan. Hu et al. [17] proposed
Graph2Plan that retrieves a graph layout from a dataset and
generates room bounding boxes as well as a floorplan in an
ad-hoc way. Sun et al. [42] proposed to iteratively gener-
ate connectivity graphs of rooms and a floorplan semantic
segmentation mask. Given a set of room types and their
area sizes as the constraint, Luo and Huang [29] proposed a
vector generator and a raster discriminator to train a GAN
model using differential rendering. Although their method
generates vector floorplans directly, it is limited to rectan-
gular shapes. Along with the adjacency graph as the in-
put, Yin et al. [5] use graph-theoretic and linear optimiza-
tion techniques to generate floorplans. Our paper also tack-
les a graph-constrained floorplan generation with a bubble
diagram as the constraint [34]. The key difference is that
HouseDiffusion processes a vector geometry representation
from start to finish, and hence, directly generating vector
floorplan samples.
Diffusion models: Deep generative models have seen great
success in broader domains [11,24,36,38,46], where a Dif-
fusion Model (DM) [7, 41, 49] is an emerging technique.
Ho et al. [14] used a DM to boost image generation quality.
Dhariwal and Nichol [35] made improvements by propos-
ing a new noise schedule and learning the variances of the
reverse process. The same authors made further improve-
ments by novel architecture and classifier guidance [10].
DMs have been adapted to many other tasks such as Natu-
ral Language Processing [23], Image Captioning [9], Time-
Series Forecasting [43], Text-to-Speech [20, 22], and fi-
nally Text-to-Image as seen in the great success of DALL-E
2 [39] and Imagen [40].
Molecular Conformation Generation [16, 18, 28, 48] and
3D shape generation [26, 27, 30, 50] are probably the clos-
est to our task. What makes our task unique and challeng-
ing is the precise geometric incident relationships, such as
parallelism, orthogonality, and corner-sharing among dif-
ferent components, which continuous coordinate regression
would never achieve. In this regard, several works use dis-
crete state space [3, 6, 15] or learn an embedding of discrete
data [9,23] in the DM formulation. However, we found that
these pure discrete representations do not train well, prob-
ably because the diffusion process is continuous in nature.
In contrast, our formulation simultaneously infers a single-
step noise as the continuous quantity and the final 2D co-
ordinate as the discrete quantity, achieving superior gener-
ation capabilities (See Sect. 5.3 for more analysis). To our
knowledge, our work is the first in using DMs to generate
structured geometry.
3. Preliminary
Diffusion models (DMs) denoise a Gaussian noise x
T
towards a data sample x
0
in T steps, whose training con-
sists of the forward and the reverse processes. The forward
process takes a data sample x
0
and generates a noisy sample
x
t
at time step t by sampling a Gaussian noise ϵ ∼ N(0, I):
x
t
=
√
γ
t
x
0
+
p
(1 − γ
t
)ϵ. (1)
γ
t
is a noise schedule that gradually changes from 1 to
0. The reverse process starts from a pure Gaussian noise
2
评论