Engineering Team Performance: enabling metrics gathering

Alessandro Lemser
10 min readJan 27, 2022

Introduction

It is often easier to find resources that describe what it means and why to understand software delivery performance, but few are the resources available that help teams on how to do it. Tools like Jira and ZenHub provide useful charts for displaying metrics like lead time (a.k.a. cycle time). However, other dimensions like time to recovery, change failure rate, deploy frequency and unplanned work are often in a grey area and there is not much guidance in place. Moreover, lead times can be seen from different perspectives, often not factored into existing solutions. This article outlines simple tactics that enable teams to collect metrics related to these dimensions.

If this topic is new for you, may be worth reading Understanding Engineering Team Performance Metrics and How to Get Started article first.

Conventions

The tactics described here for collecting data are expected to work on Jira, GitHub and ZenHub. Although it may work with other tools, it is not known if all tactics would apply in the same way.

The artefacts generated by a team may vary in their meaning when compared with other teams. Therefore, it is important to align on some conventions. Teams may use terms like Issue, Task, Story, Ticket, Bug, Support in different ways. Moreover, having a clear definition will greatly support the metrics gathering.

Issue is used in this article as a generic word to describe work of any kind which is done to build a system. A task, a bug, a story. These are issues.

The terms outcome and feature have the same meaning and are used interchangeably. They represent a complete work that will trigger customer behaviour.

The terms output and deliverable have the same meaning and are used interchangeably. They are issues that are relevant to building an outcome. Some pure technical tasks are not deliverables in this context. For example, a technical debt done while the team was working on a feature is not output — not one relevant for the outcome looking from the product perspective.

Change lead time definition found in literature and articles over the web have the same meaning as Output lead time in this article and they may be used interchangeably.

A bug may have different levels of severity (e.g. blocker, major, critical). In this article, a blocker bug identifies a bug that is relevant for the calculation of the time to recovery metric.

It is also assumed that the team is using a cycle with, at least, the following states: To do, In Progress and Done (e.g. a regular Kanban board managed in a cycle/sprint).

Enabling metrics gathering

Regardless of whether you are planning to adopt a tool or do it manually, the conventions adopted by teams while managing issues are a key part of understanding the team delivery performance. That would enable data gathering for most tools or via customized scripts. In the section below you will find a simple way to enable metrics collection.

Delivery Time

One common caveat while trying to capture the delivery lead time is to know if the measurement is being done over the outcomes or over the outputs.

Let’s try to understand the distinction between the lead times by reading the picture above from left to right. First, we have the desired impact, which is “Increase profitability”. To create that impact a team has to deliver outcomes (features) so that customers can behave in a way that may cause the desired impact. An outcome is, normally, a collection of outputs that are part of the daily team activities and they are the issues/tasks that a team works on to create the outcome.

If we measure the lead time of the outputs we will find values in hours or a few days, while measuring the lead time of the outcomes values will range from weeks to months, usually. Thinking this way it is easier to communicate with product people and engineering teams to get a better understanding of this metric.

Outcome lead time

Tactics

  • Use an Epic to group issues that are relevant to build the outcome and manage its cycle using a tool (e.g. Jira/ZenHub). Make a direct mapping between an Epic and an outcome.
  • Use the Epic as direct mapping to your outcomes. Re-think your Epic granularity or start using this concept if you are not using it. Sometimes, what one originally thought to be a user story, could be seen as an Epic.

Watch out for

  • The name “Epic” is a suggestion. Choose a name that best describes a group of related issues for your team. Epics can be an effective way to communicate with your product counterparty, as these artefacts are often at a higher granularity. A good Epic granularity would be not so small that could be seen as a task and not so big that could spill over a quarter.

How to collect

  1. Get all Epics where the status is “done” of the relevant cycle (e.g. sprint, quarter)
  2. For each Epic
  • Get the date/time it was “closed”.
  • Get the date/time it was put “in progress” for the first time.
  • Store the result including the date using columns called “issue_inprogress”, “issue_closed”.
  • Also, store the issue type (i.e. Epic) and issue number using columns called “issue_number” and “issue_type”.

Outputs lead time

Tactics

  • Use a label for identifying the issue as an output (e.g. deliverable).
  • Use a Jira ticket type to identify the issue as an output (e.g. Task) and label it as deliverable if necessary.

Watch out for

  • Other labels or Jira types may be needed to identify work that is not relevant for an outcome (e.g. tech-debt, support, improvement).

How to collect

  1. Get all deliverables where the status is “done” of the relevant cycle (e.g. sprint, quarter)
  2. For each deliverable
  • Get the date/time it was “closed”.
  • Get the date/time it was put “in progress” for the first time.
  • Store the result including the date using columns called “issue_inprogress”, “issue_closed”.
  • Also, store the issue type (i.e. Task) and issue number using columns called “issue_number” and “issue_type”.

PR Lead Time

This is a support dimension that aims to provide information about how long it takes for a pull request to be reviewed and merged to the main branch. Shorter lead times may translate to lightweight code reviews and contribute to shorter output lead times.

Tactics

  • If the team uses Jira it is necessary to link the pull request with the Jira issue. Use the Jira ticket number as the prefix of the pull request title (e.g. “ZZZ-1423 — Consuming from the stream”). This can be enforced by some tools/plugins. Another option is to use a Jira ticket type to identify the issue as an output (e.g. Task) and label it as deliverable if necessary.
  • If the team uses ZenHub, a similar approach as above could be used, but using the ZenHub issue number in the title, or description of the pull request.

Watch out for

  • One Jira ticket may be associated with multiple pull requests. That is normal and expected.

How to collect

  1. Get all issues where the status is “done” of the relevant cycle (e.g. sprint, quarter)
  2. Get all merged pull requests of the sprint and filter by the Jira ticket or issue number — filtering by the relevant repositories (the ones that control the version of production software)
  3. For each pull request
  • Get the date/time it was “merged”.
  • Get the date/time it was “created”.
  • Store the result including the dates (i.e. “change_created”, “change_merged”).
  • Also, store also the repository name and the git number for reference (e.g. “git_repo”, “git_number”).

Deploy Frequency

Deploy frequency could be obtained from the CI/CD tool in use in the organization. It could be also collected from Git depending on what happens after you merge a pull request.

Teams may want to factor only deployments which are relevant for an outcome (e.g. something with direct customer impact). If that is the case, the tactic outlined in this article would help by filtering merged pull requests associated with deliverables only.

Tactics

  • Count the number of merges to the main branch that happened during the period.
  • If the team uses ZenHub, a similar approach as above could be used, but using the ZenHub issue number in the title, or description of the pull request.

Watch out for

  • Depending on how the pipeline is set up it may be necessary to factor the time the deployment stays in staging or under sign-off in production. If this is something that happens frequently it may be important to factor in and it’s not covered in this article.
  • If a generic, organization-wide, solution is in place, it may be a good idea to standardize the use of a label (e.g. deliverable) for teams that want to count only relevant outputs.

How to collect

  1. Count the number of pull requests grouped by date (use the “change_merged” date stored in the change lead time). As the information is already stored, no need to create a field here and the value is computed from the existing information.

Mean Time to Recovery — MTTR

For this measure, apart from understanding the time involved in the pull request process and in the CI/CD pipeline, it is also important to factor in the time spent on understanding the issue.

Tactics

  • Use a label for identifying the issue as a bug (e.g. bug) and prioritize the bug (e.g. blocker, critical, minor).
  • Use a Jira type to identify the issue as a bug (e.g. Bug) and set the issue priority as a blocker, major, critical or minor.

How to collect

The first step is to clearly define how the team classifies a bug. In this article, it is assumed that a blocker bug is a failure that needs to be restored immediately. Bugs not classified as blockers are not taken into consideration for this measure.

  1. Get all issues where the type/label is Bug and status is “done” of the relevant cycle (e.g. sprint, quarter).
  2. For each issue:
  • Get the date/time it was “closed”.
  • Get the date/time it was put “in progress” for the first time.
  • Store the result including the dates (i.e. “issue_inprogress”, “issue_closed”).
  • Story the issue type (i.e. Bug) and number (“issue_number” and “issue_type”).

Change Failure Rate

This measure helps you to understand the rate of successful deliveries versus anomalies or incidents. A high rate may indicate that you need to think about the quality of your software.

Tactics

  • Use the count of the deliverables merged successfully and the number of blocker bugs to come up with this measure.

Watch out for

  • Currently, this article does not indicate how to associate the failure (Bug) with the deliverable that was deployed.

How to collect

As a result of the collection of the outcome lead times and the meantime to recovery, it is possible to count the records and find the ratio between deliverables and blocker bugs.

Unplanned Work

Tactics

  • Use a label to identify unplanned work (e.g. unplanned). The unplanned work could be split into two categories: a) unplanned; b) extra capacity. In this case, instead of a single label, two labels could be created instead (e.g. unplanned-issue, unplanned-capacity).

How to collect

Every time a new item is added to the running cycle it must receive the label.

  1. Get all issues where the label is “unplanned” where the status is “done” of the relevant cycle (e.g. sprint, quarter).
  2. For each issue
  • Get the date/time it was “closed”.
  • Get the date/time it was put “in progress” for the first time.
  • Store the result including the dates (i.e. “issue_inprogress”, “issue_closed”).
  • Store an extra flag called “issue_unplanned” and, if necessary, store the type (e.g. “unplanned_type”).

Summary of collected data

Below is what would be the headers for a CSV generated using the flow above.

CSV generated out of the process above

Issues not associated with a pull request

Some issues may not have an associated pull request and therefore the work done in the issue has not resulted in code. Things like producing documentation and creating incident analysis (a.k.a. postmortems), for instance, are issues that may not have a Git pull request.

For cases like that, a team may decide to call those issues Support Issue and have them plotted in a chart to understand the ratio between other types of issues. See more about that in the sample charts below.

Charts

Below are the charts that a team could enable by having the above metrics collected in the CSV.

The issues per layer graph are obtained by counting how many repositories are frontend and how many are backend. The repository name is used to support that.

The performance table

As a final result of the work, and to give an idea of how the team performance is compared to what the authors of Accelerate predicted in their study, the team now has a way to produce the following table:

It is important to note the threshold section below the performance table. A team can adjust the threshold to their reality or just raise the bar.

Recommendations

Organizations with a standard set of tools for issue management and CI/CD could take advantage of it and create an organization-wide solution for all teams. There are tools that could give you the above and much more, like Implement.io.

While doing an organization-wide solution keep in mind the difference between outcome and outputs. Teams could use the deployment frequency provided by a standard tool, but it would be great to have the possibility to collect data for the relevant issues (i.e. deliverables).

References

  1. Accelerate: The Science of Lean Software and DevOps, Nicole Forsgren, 2018.
  2. Outcomes over outputs, Joshua Seiden, 2019

--

--

Alessandro Lemser

Engineering Manager at Toast. I’m passionate about creating a productive, inclusive and safe environment for people to work.