2019基于符号执行与模糊测试的混合测试方法-谢肖飞 , 李晓红 , 陈翔 , 孟国柱 , 刘杨.pdf

上善若水

153

19页

1次

2022-05-23

免费下载

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn

Journal of Software,2019,30(10):30713089 [doi: 10.13328/j.cnki.jos.005789] http://www.jos.org.cn

基于符号执行与模糊测试的混合测试方法



谢肖飞

李晓红

陈

翔

孟国柱

刘

杨

(天津市先进网络重点实验室(天津大学),天津 300050)

(南通大学计算机科学与技术学院,江苏南通 226019)

(信息安全国家重点研究室(中国科学院信息工程研究所),北京 100093)

(School of Computer Science and Engineering, Nanyang Technological University 639798, Singapore)

通讯作者: 李晓红, E-mail: xiaohongli@tju.edu.cn

摘要: 软件测试是保障软件质量的常用方法,如何获得高覆盖率是测试中十分重要且具有挑战性的研究问题.

模糊测试与符号执行作为两大主流测试技术已被广泛研究并应用到学术界与工业界中,这两种技术都具有一定的

优缺点:模糊测试随机变异生成测试用例并动态执行程序,可以执行并覆盖到较深的分支,但其很难通过变异的方法

生成覆盖到复杂条件分支的测试用例.而符号执行依赖约束求解器,可以生成覆盖复杂条件分支的测试用例,但在符

号化执行过程中往往会出现状态爆炸问题,因此很难覆盖到较深的分支.有工作已经证明,将符号执行与模糊测试相

结合可以获得比单独使用模糊测试或者符号执行更好的效果.分析符号执行与模糊测试的优缺点,提出了一种基于

分支覆盖将两种方法结合的混合测试方法——Afleer,结合双方优点从而可以生成具有更高分支覆盖率的测试用

例.具体来说,模糊测试(例如 AFL)为程序快速生成大量可以覆盖较深分支的测试用例,符号执行(例如 KLEE)基于模

糊测试的覆盖信息进行搜索,仅为未覆盖到的分支生成测试用例.为了验证 Afleer 的有效性,选取标准程序集

L AVA - M 以及实际项目 oSIP 作为评测对象,以漏洞检测能力以及覆盖能力作为评测指标.实验结果表明:(1) 在漏洞

检测能力上,Afleer 总共可以发现 755 个漏洞,而 AFL 仅发现 1 个;(2) 在覆盖能力上,Afleer 在标准程序集上以及实

际项目中都有不同程度的提升.其中,在 oSIP 中,Afleer 比 AFL 在分支覆盖率上提高 2.4 倍,在路径覆盖率

上提升 6.1

倍.除此之外,Afleer 在 oSIP 中还检测出一个新的漏洞.

关键词: 软件质量保障;模糊测试;符号执行;测试用例生成

中图法分类号: TP311

中文引用格式: 谢肖飞,李晓红, 陈翔,孟国柱,刘杨.基于符号执行与模糊测试的混合测试方法. 软件学报,2019,30(10):

30713089. http://www.jos.org.cn/1000-9825/5789.htm

英文引用格式: Xie XF, Li XH, Chen X, Meng GZ, Liu Y. Hybrid testing based on symbolic execution and fuzzing. Ruan Jian

Xue Bao/Journal of Software, 2019,30(10):30713089 (in Chinese). http://www.jos.org.cn/1000-9825/5789.htm

Hybrid Testing Based on Symbolic Execution and Fuzzing

XIE Xiao-Fei

, LI Xiao-Hong

, CHEN Xiang

, MENG Guo-Zhu

, LIU Yang

(Tianjin Key Laboratory of Advanced Networking (Tianjin University), Tianjin 300050, China)

(School of Computer Science and Technology, Nantong University, Nantong 226019, China)

(State Key Laboratory of Information Security (Institute of Information Engineering, Chinese Academy of Sciences), Beijing 100093,

China)

基金项目: 国家自然科学基金(61572349, 61272106)

Foundation item: National Natural Science Foundation of China (61572349, 61272106)

本文由“面向 DevOps 的软件工程新技术”专题特约编辑荣国平、白晓颖、岳涛推荐.

收稿时间: 2018-08-29; 修改时间: 2018-10-31; 采用时间: 2018-12-14; jos 在线出版时间: 2019-04-29

CNKI 网络优先出版: 2019-04-30 09:19:14, http://kns.cnki.net/kcms/detail/11.2560.TP.20190430.0918.010.html

3072

Journal of Software 软件学报 Vol.30, No.10, October 2019

(School of Computer Science and Engineering, Nanyang Technological University 639798, Singapore)

Abstra ct : Software testing is a common way to guarantee software quality. How to achieve high coverage is a very important and

challenging goal in testing. Fuzz testing and symbolic execution, as two mainstream testing techniques, have been widely studied and

applied to academia and industry, both technologies have certain advantages and limitations. Fuzz testing can execute and cover deeper

branches by randomly mutating test cases and dynamically executing programs. However, it is difficult to generate test cases that can

cover complex conditional branches by random mutation. Symbolic execution can cover complex conditional branches with SMT solvers,

but it is difficult to cover deeper branches due to state explosion during symbolic execution. Current works have shown that hybrid testing

involving fuzzing and symbolic execution can archive better performance than fuzzing or symbolic execution. By analyzing the

advantages and disadvantages in fuzzing and symbolic execution, this study proposes a branch coverage-based hybrid testing approach

that combines the two methods with each other to achieve better test cases with high branch coverage. Specifically, fuzz testing (e.g., AFL)

quickly generates a large number of test cases that can cover deeper branches, and symbolic execution (e.g., KLEE) performs a search

based on the coverage of fuzz testing, and generating test cases for uncovered branches. To evaluate the effectiveness of Afleer, the study

selects the standard benchmark LAVA-M and one real project oSIP as the evaluation object, and uses bug detection and coverage as the

evaluation measures. The experimental results show that: 1) For bug discovery, Afleer found 755 bugs while AFL only found 1; 2) For

coverage, Afleer achieved some improvement on benchmarks and real project. In the project oSIP, Afleer increases the branch coverage by

2.4 times and the path coverage by 6.1 times. In addition, Afleer found a new bug in oSIP.

Key words: software quality assurance; fuzz testing; symbolic execution; test case generation

1 引言

随着信息技术的发展,软件已经渗透到现代社会的方方面面,而由于开发不当引入的软件漏洞也日益增多.

据统计,最近 5 年内软件漏洞数增加了 38%,而仅在 2016 年~2017 年间就增加了 14%

[1]

.软件测试是检测软件漏

洞的一种主要方法,当前工业界的主流方法还是通过手工设计测试用例来提高软件产品的质量,然而,手工生成

测试用例通常效率较低、成本高昂并且容易出错

[2]

.每年成千上亿的资金被投入到软件行业,其中软件测试一般

需要占据 50%以上的成本预算

[3]

软件漏洞可以被看作是隐藏在某个条件下的错误语句,通过提升测试用例的代码覆盖率可以提高软件漏

洞的检测概率.软件测试致力于为待测程序生成高代码覆盖率(例如语句覆盖、分支覆盖等)的测试用例以发现

软件漏洞,当被测程序配套的测试用例覆盖率高且均执行通过时,则认为该程序在一定程度上具有高可靠性.

基于覆盖率引导的模糊测试(coverage-guided fuzz testing)

[47]

(下文中所提到的模糊测试均指基于覆盖引

导的模糊测试)与符号执行(symbolic execution)

[812]

是目前两种被广泛研究和使用的测试技术.给定初始测试用

例,模糊测试(例如 AFL 工具)动态地执行目标程序,并基于覆盖率选择已有测试用例进行随机变异(mutation),从

而生成新的测试用例.常见的变异操作包括字节翻转等.该过程将不断被重复,直到不能覆盖到更多的分支或语

句为止.在动态执行测试用例时,如果检测到异常崩溃(crash),则认为发现了漏洞.由于模糊测试采取随机变异,

因此难以生成可以覆盖到复杂条件的测试用例.例如,图 1(a)中,若通过随机变异的方法,则需要最多变换 2

次

才能覆盖到条件 input=123456789.图 1(b)所示的模糊测试覆盖图表示模糊测试难以检测到漏洞 error1(即左侧

红色节点);而对于处于较深位置的漏洞 error2(即右侧红色节点),模糊测试由于采用动态执行的方法,因此容易

检测到漏洞 error2.

符号执行通过符号化执行程序来收集约束条件,并借助约束求解器

[1315]

为每条路径生成测试用例.例如,符

号执行可以很容易生成覆盖 inpu t=123456789 分支并触发漏洞 error1 的测试用例.但其缺点是,由于需要符号化

地执行程序,因此在遇到循环,尤其是循环次数依赖于输入的循环(例如图 1(a)中的 while 循环)时,循环的执行次

数无法确定,此时,符号执行会陷入不停的循环展开中,从而影响到测试用例生成的效率.并且,当遍历到程序很

深的位置时,约束条件也会变得更为复杂,使得约束求解器很难进行求解.例如,在图 1(b)中,由于循环或者其他

函数的存在,造成符号执行很难覆盖到较深的位置,从而难以检测到漏洞 error2.

of 19

免费下载

软件学报计算机技术

关注

评论