
3072
Journal of Software 软件学报 Vol.30, No.10, October 2019
4
(School of Computer Science and Engineering, Nanyang Technological University 639798, Singapore)
Abstra ct : Software testing is a common way to guarantee software quality. How to achieve high coverage is a very important and
challenging goal in testing. Fuzz testing and symbolic execution, as two mainstream testing techniques, have been widely studied and
applied to academia and industry, both technologies have certain advantages and limitations. Fuzz testing can execute and cover deeper
branches by randomly mutating test cases and dynamically executing programs. However, it is difficult to generate test cases that can
cover complex conditional branches by random mutation. Symbolic execution can cover complex conditional branches with SMT solvers,
but it is difficult to cover deeper branches due to state explosion during symbolic execution. Current works have shown that hybrid testing
involving fuzzing and symbolic execution can archive better performance than fuzzing or symbolic execution. By analyzing the
advantages and disadvantages in fuzzing and symbolic execution, this study proposes a branch coverage-based hybrid testing approach
that combines the two methods with each other to achieve better test cases with high branch coverage. Specifically, fuzz testing (e.g., AFL)
quickly generates a large number of test cases that can cover deeper branches, and symbolic execution (e.g., KLEE) performs a search
based on the coverage of fuzz testing, and generating test cases for uncovered branches. To evaluate the effectiveness of Afleer, the study
selects the standard benchmark LAVA-M and one real project oSIP as the evaluation object, and uses bug detection and coverage as the
evaluation measures. The experimental results show that: 1) For bug discovery, Afleer found 755 bugs while AFL only found 1; 2) For
coverage, Afleer achieved some improvement on benchmarks and real project. In the project oSIP, Afleer increases the branch coverage by
2.4 times and the path coverage by 6.1 times. In addition, Afleer found a new bug in oSIP.
Key words: software quality assurance; fuzz testing; symbolic execution; test case generation
1 引 言
随着信息技术的发展,软件已经渗透到现代社会的方方面面,而由于开发不当引入的软件漏洞也日益增多.
据统计,最近 5 年内软件漏洞数增加了 38%,而仅在 2016 年~2017 年间就增加了 14%
[1]
.软件测试是检测软件漏
洞的一种主要方法,当前工业界的主流方法还是通过手工设计测试用例来提高软件产品的质量,然而,手工生成
测试用例通常效率较低、成本高昂并且容易出错
[2]
.每年成千上亿的资金被投入到软件行业,其中软件测试一般
需要占据 50%以上的成本预算
[3]
.
软件漏洞可以被看作是隐藏在某个条件下的错误语句,通过提升测试用例的代码覆盖率可以提高软件漏
洞的检测概率.软件测试致力于为待测程序生成高代码覆盖率(例如语句覆盖、分支覆盖等)的测试用例以发现
软件漏洞,当被测程序配套的测试用例覆盖率高且均执行通过时,则认为该程序在一定程度上具有高可靠性.
基于覆盖率引导的模糊测试(coverage-guided fuzz testing)
[47]
(下文中所提到的模糊测试均指基于覆盖引
导的模糊测试)与符号执行(symbolic execution)
[812]
是目前两种被广泛研究和使用的测试技术.给定初始测试用
例,模糊测试(例如 AFL 工具)动态地执行目标程序,并基于覆盖率选择已有测试用例进行随机变异(mutation),从
而生成新的测试用例.常见的变异操作包括字节翻转等.该过程将不断被重复,直到不能覆盖到更多的分支或语
句为止.在动态执行测试用例时,如果检测到异常崩溃(crash),则认为发现了漏洞.由于模糊测试采取随机变异,
因此难以生成可以覆盖到复杂条件的测试用例.例如,图 1(a)中,若通过随机变异的方法,则需要最多变换 2
32
次
才能覆盖到条件 input=123456789.图 1(b)所示的模糊测试覆盖图表示模糊测试难以检测到漏洞 error1(即左侧
红色节点);而对于处于较深位置的漏洞 error2(即右侧红色节点),模糊测试由于采用动态执行的方法,因此容易
检测到漏洞 error2.
符号执行通过符号化执行程序来收集约束条件,并借助约束求解器
[1315]
为每条路径生成测试用例.例如,符
号执行可以很容易生成覆盖 inpu t=123456789 分支并触发漏洞 error1 的测试用例.但其缺点是,由于需要符号化
地执行程序,因此在遇到循环,尤其是循环次数依赖于输入的循环(例如图 1(a)中的 while 循环)时,循环的执行次
数无法确定,此时,符号执行会陷入不停的循环展开中,从而影响到测试用例生成的效率.并且,当遍历到程序很
深的位置时,约束条件也会变得更为复杂,使得约束求解器很难进行求解.例如,在图 1(b)中,由于循环或者其他
函数的存在,造成符号执行很难覆盖到较深的位置,从而难以检测到漏洞 error2.
评论