HydraNets：高效推理的专用动态架构.pdf

poPoq

255

11页

0次

2021-05-01

50墨值下载

HydraNets: Specialized Dynamic Architectures for Efﬁcient Inference

Ravi Teja Mullapudi

CMU

rmullapu@cs.cmu.edu

William R.Mark

Google Inc.

billmark@google.com

Noam Shazeer

Google Inc.

noam@google.com

Kayvon Fatahalian

Stanford University

kayvonf@cs.stanford.edu

Abstract

There is growing interest in improving the design of deep

network architectures to be both accurate and low cost.

This paper explores semantic specialization as a mecha-

nism for improving the computational efﬁciency (accuracy-

per-unit-cost) of inference in the context of image classiﬁ-

cation. Speciﬁcally, we propose a network architecture tem-

plate called HydraNet, which enables state-of-the-art archi-

tectures for image classiﬁcation to be transformed into dy-

namic architectures which exploit conditional execution for

efﬁcient inference. HydraNets are wide networks contain-

ing distinct components specialized to compute features for

visually similar classes, but they retain efﬁciency by dynam-

ically selecting only a small number of components to eval-

uate for any one input image. This design is made possible

by a soft gating mechanism that encourages component spe-

cialization during training and accurately performs compo-

nent selection during inference. We evaluate the HydraNet

approach on both the CIFAR-100 and ImageNet classiﬁca-

tion tasks. On CIFAR, applying the HydraNet template to

the ResNet and DenseNet family of models reduces infer-

ence cost by 2-4× while retaining the accuracy of the base-

line architectures. On ImageNet, applying the HydraNet

template improves accuracy up to 2.5% when compared to

an efﬁcient baseline architecture with similar inference cost.

1. Introduction

Deep neural networks have emerged as state-of-the-art

models for various tasks in computer vision. However,

models that achieve top accuracy in competitions currently

incur high computation cost. Deploying these expensive

models for inference can consume a signiﬁcant fraction of

data center capacity [

19] and is not practical on resource-

constrained mobile devices or for real-time perception in

the context of autonomous vehicles.

As a result, there is growing interest in improving the de-

sign of deep architectures to be both accurate and compu-

tationally efﬁcient. In many cases, the solution has been to

create new architectures that achieve a better accuracy/cost

Figure 1: The HydraNet template architecture: contains

multiple branches specialized for different inputs and a gate

chooses which branches to run when performing inference

on an input, and a combiner that aggregates branch outputs

to make ﬁnal predictions.

balance. Like most architectures, these new designs are

static architectures that evaluate the entire network for each

input, regardless of the input’s characteristics. In contrast,

dynamic architectures attempt to specialize the work per-

formed during inference to properties of a particular input.

For example, a cascade is a dynamic architecture that em-

ploys multiple models with different computational costs

and uses low cost models to “early out” on easy-to-process

inputs.

In this paper, we explore a dynamic architecture tem-

plate, which we call HydraNet, which achieves efﬁciency

gains by dynamically determining which subset of the ar-

chitecture to run to best perform inference on a given input.

In other words, a HydraNet maintains accuracy by having

large capacity that is semantically specialized to aspects of

the input domain. However, a HydraNet is computationally

efﬁcient because it only uses a small fraction of this capac-

ity when performing inference on any one input.

Our primary contribution is the HydraNet architecture

template, shown in Figure

1, which is a recipe for trans-

forming state-of-the-art static network designs into dynamic

architectures that offer better accuracy-per-unit-cost. We

evaluate HydraNets in the context of image classiﬁcation

where we specialize network components for visually simi-

lar classes. The gating mechanism in the HydraNet template

enables a simple training process which effectively special-

8080

izes the components to similar classes and enables condi-

tional execution for faster inference.

2. Related Work

Sparsity. The common approach to improving DNN efﬁ-

ciency is to enforce sparsity in network connectivity [

21, 29,

4, 13]. This can be achieved via manual design of new DNN

modules (Inception [

27], SqueezeNet [16], MobileNet [13])

or via automated techniques that identify and remove the

least important connections from a dense network [

11, 10].

In either case, determination of the network topology is a

static preprocessing step, and all connections are evaluated

at the time of inference.

A complimentary optimization is to employ conditional

execution at the time of inference to exploit sparsity in ac-

tivations (skipping computation and memory accesses for

model weights when activations are known to be zero).

While attractive for reducing the energy consumption of

DNN accelerators [

9], ﬁne-grained, per-element sparsity is

difﬁcult to exploit on CPUs and GPUs, which rely heavily

on wide vector processing for performance. The subtask

specialization we exploit in HydraNets can be viewed as a

mechanism for designing and training a network architec-

ture that, through coarse-grained conditional execution, is

able to more effectively exploit dynamic sparsity.

Cascades. Cascades [

28] are a common form of condi-

tional model execution that reduces inference cost (on av-

erage) by quickly terminating model evaluation on inputs

that are easy to process (“early out”). Region proposal [

24]

based models for object detection are canonical example of

cascades in DNNs. Recent work has shown that integrat-

ing cascades into deep network architectures [

8, 14] can im-

prove the accuracy vs. cost of state-of-the-art architectures,

where the later stages in a cascade specialize for difﬁcult

problem instances. The HydraNet approach of specializ-

ing network components for different subtasks is orthogonal

and complementary to the beneﬁts of cascades.

Mixture of experts. The idea of specializing components

of a model for different subtasks is related to mixture-of-

experts models where the experts are specialized for dif-

ferent inputs or tasks. Recent work on training very large

DNNs for language modeling [

26] has used conditional ex-

ecution of experts for evaluating only a small fraction of ex-

perts for each training instance. One of the key aspects ad-

dressed in [

26] is the design of the mechanism for choosing

which experts to evaluate and trade-offs in network archi-

tecture to maintain computational efﬁciency. These design

choices are tailored for recurrent models and cannot be di-

rectly applied to state-of-the-art image classiﬁcation models

which are feed-forward convolutional networks.

Hierarchical classiﬁcation. Categories in ImageNet [

25] are organized into a semantic hierarchy using an ex-

ternal knowledge base. The hierarchy can be used to ﬁrst

predict the super class and only perform ﬁne grained classi-

ﬁcation within the super class [

5, 7]. HDCNN [30] is a hier-

archical image classiﬁcation architecture which is similar in

spirit to our approach. HDCNN and Ioannou et al. [20, 17]

improve accuracy with signiﬁcant increase in cost relative

to the baseline architecture. In contrast, HydraNets on Ima-

geNet improve top-1 accuracy by 1.18-2.5% with the same

inference cost as corresponding baseline architectures.

Both HDCNN and Ioannou et al. model the routing

weights as continuous variables which are used to linearly

combine outputs from multiple experts. Jointly learning

the routing weights with the experts is similar to LEARN

in Table 8, and performs poorly due to optimization difﬁ-

culties (collapse and poor utilization of experts). HDCNN

uses complex multi-stage training to mitigate optimization

issues and provides robustness to routing error by overlap-

ping the classes handled by each expert. HydraNets use

binary weights for experts during both training and infer-

ence by dropping out all but the top-k experts. This enables

joint training of all the HydraNet components while allow-

ing ﬂexible usage of experts.

Architectural structures similar to HydraNet [

1] have

been used for learning the partition of categories into dis-

joint subsets. Our main contribution is a gating mecha-

nism which reduces inference cost by dynamically choosing

components of the network to evaluate at runtime. Recent

work [

22, 23] has explored directly incorporating inference

cost in the optimization and explore training methods for

jointly learning the routing and the network features. In

contrast to the complex training regime required for joint

learning, our approach enables a simple and effective train-

ing strategy which we comprehensively evaluate cost on

both ImageNet and CIFAR-100 datasets.

3. HydraNet Architecture Template

The HydraNet template, shown in Figure

1, has four ma-

jor components.

• Branches which are specialized for computing features

on visually similar classes. We view computing fea-

tures relevant to a subset of the network inputs as a

subtask of the larger classiﬁcation task.

• A stem that computes features used by all branches and

in deciding which subtasks to perform for an input.

• The gating mechanism which decides what branches

to execute at inference by using features from the stem.

• A combiner which aggregates features from multiple

branches to make ﬁnal predictions.

Realizing the HydraNet template requires partitioning

the classes into visually similar groups that the branches

specialize for, an accurate and cost-effective gating mech-

anism for choosing branches to execute given an input, and

8081

of 11

50墨值下载

自动驾驶

关注

评论