Trust Levels: a framework for quality in CI/CD

Author’s note: The following article is a record of the concept I presented during the Agile & Automation Days 2015 conference in Krakow under the title “Permanent Integration”. A lot has changed since that talk, including the approach to modeling code repositories (moving away from GitFlow for trunk-based development), but the problems described in the presentation still remain the same.

Software Trust Levels

Abstract

Continuous Integration (CI) process has been recently widely adopted by different software production industries. In theory it has several benefits like immediate information about product state and reduction of project costs. But in practice, very often CI systems evolved to the state, where all benefits are overridden by high levels of complexity, bad assumptions and incorrect test system design. Incorrectly designed and scaled CI systems are generating extra maintenance costs, without apparent benefits.

In this article the author will present practical hints on how to keep software product quality in a good shape with the help of a well designed Continuous Integration process. Aim of this article is to propose a development process, code repository structure and quality metrics. Composition of these elements could be the key to maintain the quality of the software with rapidly changing requirementS.

Background

The process of software production has a lot of analogies to the production processes of other industries. To obtain a high quality product, it is required to have a very well organized production workplace. Additionally, high quality products require specialized tools. This implies the need to have advanced knowledge of how to use such tools. High quality products must be correctly and very accurately tested. Test results shall be used for continuous improvement of the production process. At the end product must be properly packaged and delivered to the customer.

The above analogy captures the specificity of software production. Recent trends in the industry put tremendous pressure on automation of the process of creating, testing and delivering the software. There are a variety of software tools designed to support CI processes. But to use such tools in the most optimal way, organizations must define CI process principles like general structure of code and test repository, rules for code quality rating, rules for test automation and execution.

From the observation, the level of chaos in the project was always inversely proportional to the level of understanding of the code delivery process between repository branches (including delivery to the production environment). Development teams need clear rules on how to work. The concept of Software Trust Levels helps to immediately judge the quality of the product. Trust level is assigned to the product based on the results of different validation points (validation point result should be always binary - it could have value true or false).

The concept is assumed to be scalable, as content of the set with validation points is flexible and should be defined accordingly by the organization. The proposed software trust levels framework is intended to be easy to understand and easy to scale.

Principles of Software Testing in CI

According to ISTQB, there are seven principles of software testing that align with our Trust Level concept:

Testing shows presence of defects: Testing cannot be considered as a proof of defect-free software. We limit the number of undiscovered defects, but we can never say the product is perfect.
Exhaustive testing is not possible: We cannot practically check all use cases. It is important to be aware of the risk and choose the right subset of scenarios for testing. Trust level concept helps to choose the right subset to mitigate risk.
Start testing as early as possible: Early testing helps find critical faults. Extension for this principle is presented in the trust level definition.
Defects clustering: Usually, the majority of bugs are connected with a limited set of software functions. Identification of this set is the most efficient testing method.
The pesticide paradox: Repeating the same test cases over and over does not help find new bugs. It is recommended to review scenarios, add new ones, and move others to a different level of trust.
Testing is dependent on context: Different modules in the product may have different validation points defined in order to reach the same trust level.
Absence of errors fallacy: Well-tested software that misses key requirements is useless.

The easiest way to fulfill principles above is to automate test case execution and verdict assignment. The next level of software production maturity is the adaptation of Continuous Integration concept, which is defined as follows:

“Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.”

The Problem of a Successful Product

When we think about a new software product, one of the essential early steps in the process is creating a prototype. At this point it is usual that only the initial code base (perhaps including basic tests) is created. Although some may think that this is enough, author strongly recommend to create a few more items in the process in advance:

At least basic documentation and architecture sketch
Test plans on different levels (unit tests, smoke tests, integration tests, load tests, etc)
Static analysis rules
Code inspection rules
Security assumptions
and others specific to the product

In the real world we may experience problems: when a software product grows, the time needed to perform all required steps in our process is increasing. But it is easier and definitely cheaper to create the correct structure, methods and tools at the beginning of the project life cycle.

Uncontrolled Growth of Tests

In general: it is good to have as many tests as possible on every single test level. But it is not always the truth, that quantity is directly translated into quality. If the test execution time is too long, it generates the risk that users would likely try to skip test execution rather than improve it. The solution here might be to structure the test set into different trust levels and do not execute all test suites after every commit. Time consuming test cases can be moved to a different trust level.

Quality of Tests

Faults in the test execution are not accepted. The test case must have a clear result - true or false. False means, that either there is a software bug, which has to be fixed or test is not adapted to the new requirement. It is not acceptable to keep such failing tests. It generates decision chaos, leading to questions like: “We have 85% of test coverage, but this missing 15% part was always failing… So, can we release the product?”. There is no answer for such a question, because every single test case failure is a potential software bug. It can be run over years with a fail result and everyone knows that it is always failing… But such failed tests may be misleading, since one time it may fail due to completely new reasons.

Test Execution Time

The number of tests grows together with the product size. Software trust level concepts may help to categorize test cases. Those ones, which are most time consuming may be moved to another trust level and executed less frequently (or in another test environment).

Trust Level for Software Quality Determination

Trunk Based Development (Modern Approach)

Today, in the era of Trunk-Based Development, the idea of software trust levels has evolved. Instead of physically moving code between branches, we move the build through successive stages of the automated pipeline. However, the logic behind the “quality gates” remains the same: we don’t allow code to proceed until it reaches the appropriate Trust Level.

We map trust levels to stages in the CI/CD Pipeline:

Pull Request / Pre-merge: -> Dev environment.
Post-merge / Staging: -> Deployment to the test environment.
Production (with the feature flag disabled): -> Canary deployment, tests in production.
Production (with the feature flag enabled): -> Full user availability.

Cascade Model (GitFlow)

Let’s assume that tasks in our project are distributed to the several development teams. To keep requirements, it is necessary to have appropriate structure for the code repository. One of the possible options is the cascade model:

Master Branch: Contains production-ready code.
Project Branches: Rooted on master, used for developing additional product features.
Private Branches: Rooted on project branches, used for individual feature development.

Cascade models are still in use. It allows developers to work independently of each other and simplifies delivery and merge operations.

Software Trust Elements

To keep the repository clean, the organization must develop clear rules. For example, a project can establish several different rules for quality resulting from:

Static analysis
Unit tests
Manual code inspection
Performance tests
Security tests

Division of Validation Points

We cannot run full regression on every commit. We need to divide rules based on complexity and execution time.

Quick test (e.g. smoke test) - Level 1: Execute all checks that produce results very quickly.
Normal test - Level 2: More complex checks, extending complexity results in more accurate checking.
Duration test - Level 3: Greatest possible level of trust, longer execution time accepted.

Implementation of Trust Levels

Let’s assume we want to give the “green light” for the developer to merge code. The general hint is: define at least as many levels of trust as the number of branch types you have in the repository.

The basic rule for every pull request would assign a requirement to reach a specified level of trust before code can be merged.

Example Scenario:

Level 1: If basic tests, smoke tests and code inspection passed, product reaches the first trust level. We can merge to the project branch.
Level 2: In the test environment, we execute static analysis and regression tests. If passed, the product reaches the second trust level.
Level 3: In the staging environment, we trigger performance and duration tests. If passed, the code is ready for User Acceptance Testing (UAT) and deployment.