Metrics

Metrics are the heart of the Samaritan system. We use metrics to identify which files and commits are more likely to contain vulnerabilities.

The metrics that we use measure different human factors of software engineering. Human factors are physical, psychological, social, and other attributes of people that affect their performance, productivity, and safety.

We know from other domains that human factors are an important cause of accidents and failures 1. Because software engineering is a human activity, we believe that it’s important to measure and manage the human factors that are often the underlying, root cause behind software vulnerabilities.

Samaritan currently measure five categories of human factors that we call dimensions.

Dimensions

∆

Change

Change is about how much a project’s source code has become different. Higher measures of change are associated with vulnerabilities.

Change is relevant in the context of a commit 2 because of people’s limited working memory. The more that’s changed, the harder it is to find unintended interactions and the greater the chance that some important detail was accidentally forgotten.

Multiple peer-reviewed papers have found a correlation between measures of change and software vulnerabilities: Gegick in 2009 3; Zimmerman, Nagappan, and Williams in 2010 4; Meneely et al in 2013 5; Perl et al in 2015 6

Unfocused contribution

Unfocused contribution captures the idea of multitasking (e.g., how many different facets of a software project that each developer is working on). The more that developers work across multiple different facets of a system (e.g., components, functionality, files), the more unfocused contribution there is. Unfocused contribution is positively correlated with vulnerabilities.

There is good evidence that multitasking hinders learning and expends cognitive resources on the work task switching, instead of the task itself. Neither helps someone pay attention to relevant details to avoid an oversight or mistake.

Also, the more that developers are spreading their effort, the higher the chance that they will have to consider changes made by others. It is impossible to foresee impliciations in code that you don’t completely understand 7 and requires significant effort, for which time may not be available.

Peer-reviewed research includes Shin et al 8.

Knowledge

Knowledge is the facts, information, and skills acquired by a person through experience, training, and education. Knowledge can be of general principles, techniques, and methods, as well as of specific architectures and implementations.

Complexity

Complexity is how intricate, detailed, and interconnected things are. Complex things are harder to understand because there are more possibilities and considerations. Higher complexity is positively correlated with vulnerability.

Prior vulnerability

Prior vulnerability is whether something has been known to be vulnerable in the past. That something could be a file, function, or type of functionality. Past vulnerability is a positive indicator of present vulnerability.

Footnotes

Some of the best research comes from aviation, where there’s been significant investment in the study of air flight and maintenance crews. The U.S. Federal Aviation Agency (FAA) has an entire Human Factors Division that publishes remarkably accessible guidance and research. But you can also look to healthcare, occupational safety, nuclear power generation, and warfighting… to name a few. ↩
The operation that sends a developer’s changes to a source code version control repository.↩
M. Gegick, P. Rotella, and L. Williams, “Predicting Attack-prone Components,” in 2009 International Conference on Software Testing Verification and Validation, Denver, CO, USA, 2009, pp. 181–190, doi: 10.1109/ICST.2009.36 [Online]. Available: http://ieeexplore.ieee.org/document/4815350/.↩
T. Zimmermann, N. Nagappan, and L. Williams, “Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista,” in 2010 Third International Conference on Software Testing, Verification and Validation, 2010, pp. 421–428, doi: 10.1109/ICST.2010.32.↩
A. Meneely, H. Srinivasan, A. Musa, A. R. Tejeda, M. Mokary, and B. Spates, “When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits,” in 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, 2013, pp. 65–74, doi: 10.1109/ESEM.2013.19.↩
H. Perl et al., “VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security – CCS ’15, Denver, Colorado, USA, 2015, pp. 426–437, doi: 10.1145/2810103.2813604.↩
or even code that you do↩
Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, “Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities,” IEEE Transactions on Software Engineering, vol. 37, no. 6, pp. 772–787, Nov. 2011, doi: 10.1109/TSE.2010.81.↩

Metrics

Dimensions

Footnotes

Technology

Company

Help Us