Reading Note: “An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application”, ICSE-SEIP '17

date

Jan 25, 2024

slug

industrial-eval-of-automatic-test-generation

status

Published

TL; DR

本文通过一家金融企业的真实场景下的企业级项目，评估了两种常见的单元测试生成工具（基于 Feedback directed random testing 的 Randoop 和基于 search-based software testing 的 Evosuite）的有效性，以为研究人员进一步改善这些工具，弥补学术界和工业界之间的 gap 做指导。

在用作评估的软件单元中，Evosuite 检测出了 56.40% 的缺陷，Randoop 检测出了 38.00% 的缺陷。对于 failure cases 的分析发现，没检测出来的缺陷主要是由于生成的测试无法构造出某些特定的基础类型的值的组合（50.00%），以及无法构造出复杂的对象（47.62%）。

通过对开发者的问卷调查揭示出了一些改进工具层面的方向，比如和常见的构建工具集成，以及提高生成的测试的可读性。

Experiment Findings

Existing tools can potentially detect most of the faults (19 out of 25 were detected at least once). But there are also some faults (6 out of 25) that are never found within the explored search budgets.

Faults whose triggering requires generating input object data with complex states are hard to detect.

Assertions and readability of generated tests need to be improved. To be embraced by developers, test generation tools need to support the major development frameworks.

Lessons Learned

The use of unit test generation tools on the command line requires detailed understanding of the build infrastructure, and tool documentations are currently not helpful in achieving a correct setup.

100% of the challenging faults remained undetected as none of the tools were able to construct and populate objects with complex structure. More research on how to solve this problem is required.

Only 47.78% (EVOSUITE) and 12.22% (RANDOOP) of hard faults which require specific primitive values have been detected, even if the faulty statements are executed. Covering code is not enough: further criteria to optimize should be designed to help these tools in generating this kind of input values.

At least 50% (EVOSUITE) and 64% (RANDOOP) of the specification faults could have been detected with more appropriate assertions. More research in effective assertion generation would hence be useful.

Developers in the industry expect automated test generation tools to integrate with standard continuous integration tools. For an effective technology transfer from academic research to industrial practice, building plugins for these tools would be useful.

Developers in industry are concerned about the readability of generated unit tests, the generated input data, and the generated assertions. These are topics that would warrant further research.

Summary

以上提到的当前的单元测试工具在业界应用的需要改进之处可以按照性质分为两类：

工具的工程优化：比如和构建系统集成，支持常见的框架等。

方法的创新或者改进：比如构造特定数值输入或复杂对象输入，构造有意义的 assertions，提高生成的测试代码的可读性和可维护性等。