Proj THUDBFuzz Paper Reading: Static Program Analysis as a Fuzzing Aid

Github

https://github.com/test-pipeline/Orthrus

Abstract

Fuzzing有效、成规模
无法全面测试控制流有很大差异的应用，比如firewalls和网络包分析器
本文：静态分析指引已有程序模型
前提：代码模式反映了输入的数据模式
本文方法：通过分析程序控制流和数据流来自动构建input dictionary
效果：
增加coverage 10%-15%
暴露安全隐患的速度快乐一个数量级
发现15个0-day漏洞
实验对象：
nDPI(packet inspection library)
tcpdump(network packet analyzer)

发现:
the synergy between program analysis and testing can be exploited for a better outcome.

1. Intro

P1. 软件复杂、安全很难、fuzzing可用

P2. 复杂

P3. 为何已有的方法不能用于测试复杂的第三方网络软件：

需要用户提供数据模型或者语法说明，但只有少数的网络协议有正式说明
尽管有Prospex这种从网络trace中自动重建语法的，也只能面向单个协议
白盒方式：需要大量更改源码、还需要一定领域知识
例如：需要标出parsing函数

P4. 本文：

how the stated challenges can be addressed by augmenting fuzzing with static program analysis.
方法:
静态分析获取字典
字典被fuzzer用于生成message fragments
为了便于部署，写了clang/llvm 插件

P5: 工具Orthus

2. Background

3. Program Analysis Guided Fuzzing

Problem Scope

protocol specification = state machine, message format
本文只做message format推测
Since file formats are stateless specificat ions, our work is applicable for conducting security evaluations of file format parsers as well.

Approach Overview

message construct:例如字符串
message conjunction: message constructs的连接
特点：

无需语法说明
无需改动软件
as-is to existing fuzzers

3.1 Input Dictionary Generation

Program Slicing

determining the subset of program statements, or variables that process, or contain program input
问题是确定程序语句、变量的子集
已有方法问题：只能做小型程序单个进程的
本文：backward program slicing
原则：

data-dependent control flow instruction
data sink API(e.g strcmp), functions that accepts const arg
const assignment
好处：
与控制流、数据流相关，不会混入uninteresting
与parser写法相似

Analysis Queries

Syntactic Queries

query示例
stringLiteral(hasParent(callExpr(hasName(``strcmp'')))).
函数查询是组合式的

Semantic Queries

分析上下文可以让我们更深入地了解输入消息格式。例如，我们可能知道哪两个结构彼此结合使用，或者在涉及这些结构的语法产生规则之间是否存在偏序。
在语义级别，查询message constructs的列表作为输入，并返回conjunctions作为输出。语义查询是针对在程序的 CFG 上构建的context-sensitive inter-procedural graph 进行的。每个查询都被编写为一个checker routine，它返回可以在message constructs的调用上下文中验证的conjunctions。