# Precisely Characterizing Security Impact in a Flood of Patches via Symbolic Rule Comparison

Contents

I enjoy reading this paper. This paper makes everything very clear, and makes announcement about the definition again and again. It mentions his source code in the paper; however, without the open source code. Hope it can release the open repository in the future, it can persuade me more about its effectiveness.

## Introduction and Background

For software vendors and maintainers, they always can get many reports from users about the bugs. Checking manually on each reports becomes more and more challenging because many information such like the security impact, patch effectivness are all included in the reports. Maintainers mustn’t just apply the patch in the report, and they should check whether the bug is security bug or not. Even among the security bug, they should also check the priority first, the lower-priority bugs might be fixed slowly based on the time cost. The authors of this paper also mentioned that assessing the security of bugs manually is not only challenging, expensive, but also error-prone because of requirements of professional knowledge.

To automatically identify the security bug, many works are proposed based on text-mining techniques on description of reports; however, this doesn’t work very well because it assumes that the reports from users are reliable.

To automatically analyze the security impact of bugs, many existing works also don’t provide sufficient support. Static-analysis tools can warn about the misused code constructs which may have security impact; however, the tools make conservative assumptions to produce many false-positive reports! To assess the security impact in another way, providing PoC is always a strong evidence. However, it would put a big burden on users who want to report the bug, and maintainers also want to fix the security bug much faster than the process of exploiting, which is said to be ahead of attackers. Therefore, we need a tool to access the security impact without PoC, and also has a low false-positive rate.

The proposed work, SID, is used to determine the security impacts of bugs given patches (Pay attention this is the goal). To achieve this goal, SID needs to make sure the patch will fix a security violation, and thus the bug will have security impact.

Following are the contributions of this paper:

1. A study of security bus and patches
2. Symbolic rule comparison for determining security impacts
3. Finding of security bugs and unpatched vulnerabilities

Here comes the overview of SID:

The definitions of patch model components:

• Security operations: security check
• Critical variables: the ones whose values lead to the security impacts
• Vulnerable operations

Patch model assumes that a security operation is usually inserted before vulnerable operation.

According to the patch model mentioned above, authors summarize the patch patterns for each class of vulnerability (I would mention the scope of vulnerabilities in following):

Assumptions:

• The bug fixed by a patch is triggerable

Security Rule to be violated:

• Out-of-bound access
• Use-after free and double free
• Uninitialized use
• Permission bypass

## Pre-processing

We can separate pre-processing into two parts:

1. Given a commit, make sure it’s bug fixing patch or not
2. Collect all dependent files and compile to LLVM IR

In step 1, commits for branch merging, documentation, and code-formatting are eliminated (it takes for 17.4%).

In step 2, employ static analysis to extract dependent files from patch. (First, extract the patch code and variables from the patch. Second, employ standard control-dependency against patch code, taint analysis against variables to identify all dependent files) After getting the dependent files, SID can also get unpatched version with git checkout. At the end, just invoke clang to compile these files.

In order to solve the path-explosion problem in symbolic execution, SID also unroll loops in LLVM IR level (If interested, take a look at Fitness-guided path exploration in dynamic symbolic execution).

## Dissecting patches

In this phase, SID would dissect patches to identify three key components based on patch model.

First, SID identifies security operations in both patched and unpatched versions. Author identified security operations as Bound check (=, >(=), <(=) between two integers), Pointer nullification (can be easily identified when NULL is assigned to a pointer, works against use-after-free), Initialization (store 0 to a variable or memset(), against uninitialized use), and Permission check (match permission function such as afs_permission() or ns_capable(), then there should be a check against return value of these functions).

Second, SID extracts critical variables from identified security operations.

Third, SID applies data-flow analysis against critical variable to find vulnerable operations. Same as discussed in first part, Out-of-bound access (acess array or buffer or with functions which takes critical variables as input), Use-after-free or Double-free (all pointer dereference which targets the critical variables), Uninitialized use (any common operations which target the critical variables), and Permission bypass.

The data-flow analysis also helps SID collect slices security operations to vulnerable operations. (At the first time, I was confused about the reason for security operations to vulnerable operations. Now I think it is because of critical variable, which are extracted from security operations, with data-flow to affect vulnerable operations) After getting multiple slices, SID would try pruning some infeasible slices.

## Symbolic rules comparison

After pruning the infeasible slices, SID would try to use symbolic execution to prove the unsolvability of the left slices.

Remember following two rules:

1. The patched one never violates security rule.
2. The unpatched one always violates security rule with patch removed.

We need to collect constraints from three sources: patches, security rule, and slice paths for patched and unpatched separately.

### Collect constraints from patches

Security Operations Constraints for Patched Constraints for Unpatched
Pointer nullification Fcv = 1 Fcv = 0
Initialization Fcv = 1 Fcv = 0
Permission heck Fcv = 1 Fcv = 0
Bound check lowbound < CV < upbound  CV >= upbound or CV <= lowbound

For each security operations, SID would build up constraints separately. For first three types of security operations, Fcv shows the status of critical variable with binary flag symbol: Value 1 shows that security operation has been enforced against critical variable while 0 means not. The cases of bound check are more complicated because we need to know specific value range, so here comes a symbolic critical variable.

We can find that constraints for patched and unpatched versions are constructed in opposite ways. In fact, it is very easy for us to understand that security operations are present in patched version while not in unpatched version.

It’s weird that security operation exists in unpatched, isn’t it? Correct, the constraints for unpatched are not built from the unpatched, but just the opposite of the constraints for patched.

### Collect constraints from security rules

Security rules Constraints for Patched Constraints for Unpatched
No use after free Fcv = 0 Fcv = 1
Use after initialization Fcv = 0 Fcv = 1
Permission check before sensitive operations Fcv = 0 Fcv = 1
In-bound access CV >= MAX and/or CV <= MIN CV < MAX or CV > MIN

These constraints are used to evaluate whether a security-rule violation would happen or not. The value 0 shows that the case doesn’t violate the security rule while 1 shows it violates.

### Collect constraints from slice paths (Still confused about this part)

Collect constraints from paths which are from security operations to vulnerable operations. They come from two parts:

1. This is same as the one in removing infeasible slices.
2. Manipulations against critical variables.

### After the collection of constraints

To perform symbolic rule comparison, we need to map the corresponding slices in the patched and unpatched versions. To do so, SID first extracts the vulnerable operations both from patched and unpatched versions. The vulnerable operations should match first. Then, SID would compare the control-flow to make sure whether two slices are the same except for the part changed by the patch.

At the end, SID would merge three constraints for patched and unpatched respectively, and uses Z3 to check unsolvability for each slice. If either constraint set is solvable, SID disqualifies the bug fixed by the patch as a security bug.

To avoid the path explosion problem which is common in symbolic exection, SID would discard the slice which includes more than 150 basic blocks. This work is based on the statistical study that more than 98.8% of slices cover less than 150 basic blocks.

## Conclusion

The paper review is still not done. The rest of parts is for section Evaluation. I plan to follow up in the future.

## Reference

Wu, Qiushi & He, Yang & McCamant, Stephen & Lu, Kangjie. (2020). Precisely Characterizing Security Impact in a Flood of Patches via Symbolic Rule Comparison. 10.14722/ndss.2020.24419.