暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
面向稀疏卷积神经网络的GPU性能优化方法-董晓,刘雷,李晶,冯晓兵.pdf
73
22页
1次
2022-05-24
免费下载
软件学报 ISSN 1000-9825, CODEN RUXUEW
Journal of Software, [doi: 10.13328/j.cnki.jos.006051]
©国科学院软件研究所版权所.
一种向稀疏卷积神经网络的
GPU
1,2
,
1
,
1
,
晓兵
1,2
1
(算机体系结构国家重点验室(国科学院
计算术研究所
2
(国科学院大学, 100190)
通讯作者: , E-mail: liulei@ict.ac.cn
: 近些年来,
深度卷积神经网络在多项任务中展现了惊人的能力
翻译等众多应用中.但这些模型往往参数规模庞大,
并带来了沉重的计算负担
删除模型中对精度影响较小的参数,
从而降低模型的参数数目和理论计算
剪枝后的稀疏模型却难以在 GPU 上实现高效执行,
其性能甚至差于剪枝前的稠密模型
的执行性能收益.在本文中,
我们提出了一种稀疏感知的代码生成方法
们为卷积算子设计了算子模板,并结合 GPU
的特点对模板代码进行了多种优化
析被转换为算子中间表示模板,
我们设计了一种稀疏代码生成方法
生成对应的稀疏卷积代码.,
我们利用了神经网络执行过程中的数据访问特点
,效提升了访存吞吐.最后,
稀疏参数的位置信息被隐式编码在生成的代码
访存需求.在实验中,我们证明了相对于 GPU
上已有的稀疏神经网络执行方法
法能够有效提升稀疏卷积神经网络的性能.
关键词: 神经网络;稀疏;GPU;性能优化;卷积;
代码生成
图法类号: TP311
Performance Optimizi
ng Method for Sparse Convolutional Neural Networks on GPU
DONG Xiao
1,2
, LIU Lei
1
, LI Jing
1
, FENG Xiao-Bing
1,2
1
(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 1001
China)
2
(University of Chinese Academy of Sciences, Beijing 100190, China)
Abstract:
In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been
deployed in applications including object detection, autonomous
driving, machine translation, etc. But these models are accompanied by
huge amounts of parameters and bring a
heavy computational burden. The neural network pruning technique can recognize and remove
parameters that contribute little to the accuracy, resu
lting in reduced amounts of parameters and decreased th
requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to
efficient execution on GPUs, and the perfor
mance of sparse models cannot even match their well
paper, we design a sparsity-
aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned
neural networks. First, we design a
template for convolution operator
compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as th
the designed algorithm
to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve
memory throughput, we perform optimizations on data access and data placement based on the characteristics of memory access i
neural networks. F
inally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse
parameters can be eliminated, reducing the memory footprint
during
code generating
method can improve the performance of sparse convolutional neural networks compared with current methods.
收稿时间: 2019-10-05; 修改时间: 2020-01-13, 2020-03-
17
E-mail: jos@iscas.ac.cn
http://www.jos.org.cn
Tel: +86-10-62562563
GPU
性能化方
计算术研究所
), 100190)
深度卷积神经网络在多项任务中展现了惊人的能力
,并已经被用在物体检测、自动驾驶和机器
并带来了沉重的计算负担
.神经网络的模型剪枝技术能够识别并
从而降低模型的参数数目和理论计算
,模型的高效执行提供了机会.然而,
其性能甚至差于剪枝前的稠密模型
,导致模型剪枝难以带来真正
我们提出了一种稀疏感知的代码生成方法
,能够生成高效的稀疏卷积 GPU 程序.首先,
的特点对模板代码进行了多种优化
.算子模板中的源代码经过编译和分
我们设计了一种稀疏代码生成方法
,能够结合剪枝后的稀疏参数,基于中间表示模板
我们利用了神经网络执行过程中的数据访问特点
,数据的访问和放置进行了
稀疏参数的位置信息被隐式编码在生成的代码
,不需要额外的索引结构,低了
上已有的稀疏神经网络执行方法
,本文提出的稀疏感知的代码生成方
代码生成
ng Method for Sparse Convolutional Neural Networks on GPU
(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 1001
90,
(University of Chinese Academy of Sciences, Beijing 100190, China)
In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been
driving, machine translation, etc. But these models are accompanied by
heavy computational burden. The neural network pruning technique can recognize and remove
lting in reduced amounts of parameters and decreased th
eoretical computational
requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to
achieve
mance of sparse models cannot even match their well
-optimized dense counterparts. In this
aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned
template for convolution operator
s with several optimizations targeting GPU architecture. Through
compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as th
e input to
to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve
memory throughput, we perform optimizations on data access and data placement based on the characteristics of memory access i
n
inally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse
during
the execution. In experiments, we demonstrate the proposed sparse
method can improve the performance of sparse convolutional neural networks compared with current methods.
17
; 用时间:2020-04-01; jos 在线出版时间: 2020-04-21
2
Journal of Software 软件学报
Key words: neural networks; sparse; GPU; performance optimization; convolution; code generation
深度经网近些年来续受学术和工业界广泛.自从 2012 AlexNet
[1]
在大模图像分
惊人,网络,
增长大规,不断经网务中.,业也络模
应用种应.型的包括检测
[2] [3]
,自动
[4]
,机器
[5]
.些模
展现,量存,随着
.,图像类和物体测等算机视觉用的 ResNet50 络模
[6]
包含 2500 个参,
张形 224*224 彩色图像行分需要执行 76 亿次运.一方,经网络模能力进步也依
.别的 LeNet5
[7]
6 ,
ImageNet
[8]
大规模图分类赛中得优异表 AlexNet
[1]
,其参数数超过 6000 .庞大参数
规模计算求阻了神经网模型广泛,同时使得现神经网的高执行为了一个有很
实际,同时十分迫的问题.
面对经网庞大参数数目海量计算,研究员提了模型剪的方来挖神经网络
数中余性,对模行简.被移参数要保,与之的计可以,模型
剪枝法能有效低神经网模型存储销和计算.在保剪枝的模在目任务上的度损
条件,别出,并将
网络,的模.法中,对可的分约束化剪
.
,除模型中 90%参数
[9] [10] [11] [12]
,时将剪枝稀疏型的计算求降为剪前的 10%.
尽管结构的模型剪方法效降了理论计,但在 GPU 上将这部理论能收益转
实际性能速却面临严峻挑战.,剪枝前的密计相比,枝后稀疏计算计算度更.
使得 GPU 算核 DRAM 的数据传容易为性瓶颈,致剪后的稀疏算难充分利用 GPU
计算.其次,稀疏数据,往往仅保其中的非 0 元素,使用额外的引结存储这些素的
,., CSR(Compressed
Sparse Row)
[13]
CSC(Compressed Sparse Column)COO(Coordinate)和稀疏张格式 CSF(Compressed Sparse
Fiber)
[14]
.示中构增经网的访,计算
的问.,GPU 身的行模和存储层都比复杂,且目已有的在 GPU 进行密神经网计算
cuDNN
[15]
cuBLAS
[16]
方法工优.,访问
GPU 的体特征,对数局和划分行相,能将计算论加果转
际的能收.
,们提,剪枝
成高的前推理的执代码. 1 展示了我方法整个流程.首先,们为积算子设模板.
板不参数.算子译和,我们积算.对中
表示,建立与中令的.,具体型参
,示模和变,除与数相,针对的算
.示模同的间复.另外,们基数在行过固定
特点,为模数和输入了不的访,升了中的访存吞.,生成疏卷
算子码中,
数的已经码在,额外,降低中的访,
of 22
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜