09.XNOR.Net_ ImageNet Classification Using Binary Convolutional Neural Networks.pdf

Libria

225

17页

0次

2021-02-22

50墨值下载

XNOR-Net: ImageNet Classiﬁcation Using Binary

Convolutional Neural Networks

Mohammad Rastegari

†

, Vicente Ordonez

†

, Joseph Redmon

∗

, Ali Farhadi

†∗

Allen Institute for AI

†

, University of Washington

∗

{mohammadr,vicenteor}@allenai.org

{pjreddie,ali}@cs.washington.edu

Abstract. We propose two efﬁcient approximations to standard convolutional

neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-

Networks, the ﬁlters are approximated with binary values resulting in 32× mem-

ory saving. In XNOR-Networks, both the ﬁlters and the input to convolutional

layers are binary. XNOR-Networks approximate convolutions using primarily bi-

nary operations. This results in 58× faster convolutional operations (in terms of

number of the high precision operations) and 32× memory savings. XNOR-Nets

offer the possibility of running state-of-the-art networks on CPUs (rather than

GPUs) in real-time. Our binary networks are simple, accurate, efﬁcient, and work

on challenging visual tasks. We evaluate our approach on the ImageNet classiﬁ-

cation task. The classiﬁcation accuracy with a Binary-Weight-Network version of

AlexNet is the same as the full-precision AlexNet. We compare our method with

recent network binarization methods, BinaryConnect and BinaryNets, and out-

perform these methods by large margins on ImageNet, more than 16% in top-1

accuracy. Our code is available at: http://allenai.org/plato/xnornet.

1 Introduction

Deep neural networks (DNN) have shown signiﬁcant improvements in several applica-

tion domains including computer vision and speech recognition. In computer vision, a

particular type of DNN, known as Convolutional Neural Networks (CNN), have demon-

strated state-of-the-art results in object recognition [1,2,3,4] and detection [5,6,7].

Convolutional neural networks show reliable results on object recognition and de-

tection that are useful in real world applications. Concurrent to the recent progress in

recognition, interesting advancements have been happening in virtual reality (VR by

Oculus) [8], augmented reality (AR by HoloLens) [9], and smart wearable devices.

Putting these two pieces together, we argue that it is the right time to equip smart

portable devices with the power of state-of-the-art recognition systems. However, CNN-

based recognition systems need large amounts of memory and computational power.

While they perform well on expensive, GPU-based machines, they are often unsuitable

for smaller devices like cell phones and embedded electronics.

For example, AlexNet[1] has 61M parameters (249MB of memory) and performs

1.5B high precision operations to classify one image. These numbers are even higher for

deeper CNNs e.g.,VGG [2] (see section 4.1). These models quickly overtax the limited

storage, battery power, and compute capabilities of smaller devices like cell phones.

2 Rastegari et al.

. . .

Input

Weight

!"#$%&'()*&+*,%-.( /0"&*,%-.(

1."2(+-(

3%-4%51,%-(

6"7%&8(

9*4+-:(

;<-="&"->"?(

Computation

( Saving(

;<-="&"->"?(

C>>1&*>8(%-(

<7*:"!"#(

;C5"D!"#?(

9#*-2*&2(

3%-4%51,%-(

(

E(F(G(F(H(

ID( ID( JKLMN(

O+-*&8(P"+:Q#(

E(F(G(

RLSD( RTD(

JK6MV(

O+-*&8P"+:Q#((

O+-*&8(<-01#(

;!"#$%"&'?(

W!/X(F(

Y+#>%1-#(

RLSD( RKVD( JSSMT(

0.11 -0.21 ... -0.34

-0.25 0.61 ... 0.52

Real-Value Weights

Real-Value Inputs

0.11 -0.21 ... -0.34

-0.25 0.61 ... 0.52

Binary Weights

Real-Value Inputs

1 -1 ... -1

-1 1 ... 1

Binary Weights

Binary Inputs

32x

Fig. 1: We propose two efﬁcient variations of convolutional neural networks. Binary-

Weight-Networks, when the weight ﬁlters contains binary values. XNOR-Networks,

when both weigh and input have binary values. These networks are very efﬁcient in

terms of memory and computation, while being very accurate in natural image classiﬁ-

cation. This offers the possibility of using accurate vision techniques in portable devices

with limited resources.

In this paper, we introduce simple, efﬁcient, and accurate approximations to CNNs

by binarizing the weights and even the intermediate representations in convolutional

neural networks. Our binarization method aims at ﬁnding the best approximations of the

convolutions using binary operations. We demonstrate that our way of binarizing neural

networks results in ImageNet classiﬁcation accuracy numbers that are comparable to

standard full precision networks while requiring a signiﬁcantly less memory and fewer

ﬂoating point operations.

We study two approximations: Neural networks with binary weights and XNOR-

Networks. In Binary-Weight-Networks all the weight values are approximated with bi-

nary values. A convolutional neural network with binary weights is signiﬁcantly smaller

(∼ 32×) than an equivalent network with single-precision weight values. In addition,

when weight values are binary, convolutions can be estimated by only addition and

subtraction (without multiplication), resulting in ∼ 2× speed up. Binary-weight ap-

proximations of large CNNs can ﬁt into the memory of even small, portable devices

while maintaining the same level of accuracy (See Section 4.1 and 4.2).

To take this idea further, we introduce XNOR-Networks where both the weights

and the inputs to the convolutional and fully connected layers are approximated with

binary values

. Binary weights and binary inputs allow an efﬁcient way of implement-

ing convolutional operations. If all of the operands of the convolutions are binary, then

the convolutions can be estimated by XNOR and bitcounting operations [11]. XNOR-

Nets result in accurate approximation of CNNs while offering ∼ 58× speed up in CPUs

(in terms of number of the high precision operations). This means that XNOR-Nets can

enable real-time inference in devices with small memory and no GPUs (Inference in

XNOR-Nets can be done very efﬁciently on CPUs).

To the best of our knowledge this paper is the ﬁrst attempt to present an evalua-

tion of binary neural networks on large-scale datasets like ImageNet. Our experimental

fully connected layers can be implemented by convolution, therefore, in the rest of the paper,

we refer to them also as convolutional layers [10].

of 17

50墨值下载

database

关注

评论