JIRA: Time spent in column

Recently I’ve been working on a small script that gathers information about Jira issues.

We wanted to know the time that a given ticket spends in each column on our kanban board. This can be useful for many reasons: for instance, you can measure the time an issue spent in the Open status to get a ballpark estimate of the average time a ticket is waiting to be assigned. Or you can measure the average time in In Progress to get an idea of velocity in general. You can of course measure any custom status (column) as well.

There is a plugin called the Time in Status for Jira Cloud, which accomplished this purpose. However, if you just want something free, you can hand-roll your custom reporting script like this.

# Requirements

  • Access to Jira so that you can generate an access token and query boards
  • You need to generate a Jira API access token following the instructions here.
    • You need to export it like so: export JIRA_TOKEN=<your Jira token here>
    • And your email: export JIRA_USER=<your email here>
  • sqlite3 (> 3.25, for the LAG window function)
  • python3

# Details

For simplicity, I’ve used a SQLite database with only one table.

Please forgive the denormalised schema, but I think it’s best to store everything in one table for this simple use case.

In schema.sql:

CREATE TABLE IF NOT EXISTS changelog (
	issue_key TEXT NOT NULL,
	from_column TEXT NOT NULL,
	to_column TEXT NOT NULL,
	timestamp TEXT NOT NULL,
	issue_created TEXT NOT NULL
)

You can run this migration with sqlite3

$ sqlite3 changelog.db
$ sqlite> .read schema.sql

Create a python file to define the functions for getting the required details:

# time_in_column.py

import os
import sys

from datetime import datetime
import sqlite3

import requests
from requests.auth import HTTPBasicAuth


USER = os.getenv('JIRA_USER')
TOKEN = os.getenv('JIRA_TOKEN')

# This usually looks like"<company>.atlassian.net/rest/"
JIRA_API_BASE = '<your jira api base URL>'  

DB_CONN = sqlite3.connect('changelog.db')

DATE_FORMAT = '%Y-%m-%dT%H:%M:%S.%f%z'

You can get all issues on a given board with this function:

def get_issues_from_board(board_id):
    endpoint = JIRA_API_BASE + f'agile/1.0/board/{board_id}/issue'

    res = requests.get(endpoint, auth=HTTPBasicAuth(USER, TOKEN))

    # Could do some error handling here
    issues = res.json().get('issues')

    if not issues:
        print("No issues found")
        return []

    for issue in issues:
        yield issue

You can get the entire changelog of a given issue like so:

def get_issue_changelog(issue):
    issue_id = issue.get('id')
    key = issue.get('key')
    issue_created = issue.get('fields').get('created')
    issue_created = datetime.strptime(issue_created, DATE_FORMAT)
    issue_changelog_endpoint = (
        JIRA_API_BASE + f'api/3/issue/{issue_id}?expand=changelog'
    )

    res = requests.get(issue_changelog_endpoint, auth=HTTPBasicAuth(USER, TOKEN)).json()

    changelog = res.get('changelog')

    histories = get_histories(changelog)

    for item in histories:
        from_column, to_column, timestamp = item
        yield (key, from_column, to_column, timestamp, issue_created)

The get_histories function used above serves to get the status transitions of an issue. It looks like this:

def get_histories(changelog):
    histories = changelog.get('histories')

    histories = sorted(histories, key=lambda item: item.get('created'))

    transitions = []

    for history in histories:
        # Timestamp of the state transition
        created_at = history.get('created')
        created_at = datetime.strptime(created_at, DATE_FORMAT)

        for item in history.get('items'):
            if item.get('field') == 'status':
                to_column = item.get('toString')
                from_column = item.get('fromString')

                transitions.append((from_column, to_column, created_at))
    return transitions

Finally, you can insert them into the changelog table:

def insert_history_to_db(key, from_column, to_column, timestamp, issue_created):
    c = DB_CONN.cursor()
    c.execute("INSERT INTO changelog VALUES (?, ?, ?, ?, ?)", (key, from_column, to_column, timestamp, issue_created))
    DB_CONN.commit()

Finally, to enable running the scrip from the command line:

if __name__ == '__main__':
    _id = sys.argv[1]
    if not _id:
        print("Board ID missing")
        sys.exit()

    for issue in get_issues_from_board(_id):
        key, from_column, to_column, timestamp, issue_created = get_issue_changelog(issue)
        insert_history_to_db(key, from_column, to_column, timestamp, issue_created)

    DB_CONN.close()

You can run the script with $ python time_in_column.py <kanban board ID>.

# Interpreting results: enter LAG()

The changelog table represents the state transitions of each issue. How can you use this data for “time spent in column” reports?

If we can somehow get the difference between the timestamps of state transitions of the same issue, then we know that that is the time spent in the given column.
SQL has a handy LAG function for this: it’s a window function that can do calculations on rows based on a given physical offset (which is usually 1). This means that it enables spreadsheet-like calculations based on the row above or below the current one, while partitioning the data into chunks that make sense.

This might be common knowledge, but I didn’t know about this function until a few months ago. Even Julia Evans has tweeted about its usage!

The expression looks like this:

LAG (scalar_expression [,offset] [,default])  
    OVER ( [ partition_by_clause ] order_by_clause ) 

For me to understand what I need to partition by, I ask myself: “What is the entity I want to compare multiple rows for?”. Basically, “Where should the calculation restart”?

In this case, this is the issue_key: since I want to calculate the differences of the timestamp column for the same issue_key.

The query looks like this:

SELECT
  issue_key AS key, 
  from_column AS status, 
  timestamp AS transition_timestamp, 
  JULIANDAY(timestamp) - LAG(
    JULIANDAY(timestamp)
  ) OVER (PARTITION BY issue_key ORDER BY datetime(timestamp) ASC) AS time_in_column 
FROM changelog ORDER BY issue_key, DATETIME(timestamp) ASC;

This gives us each ticket’s lifetime in each column.

KEY      STATUS                 TRANSITION TIMESTAMP              TIME IN COLUMN (DAYS)
--------------------------------------------------------------------------------------
XY-8     Ready for Development  2020-02-11 08:54:24.244000-06:00                
XY-8     Ready                  2020-02-11 11:12:12.890000-06:00  0.095701921265
XY-8     Open                   2020-02-11 11:17:07.636000-06:00  0.003411412239
XY-8     In progress            2020-02-12 10:30:25.317000-06:00  0.967565752100
XY-8     Closed                 2020-02-14 07:52:24.254000-06:00  1.890265474561
XY-8     In progress            2020-02-25 04:46:18.715000-06:00  10.87076922459
XY-9     Ready for Development  2020-02-10 11:06:21.060000-06:00                
XY-9     Blocked                2020-02-11 10:25:12.282000-06:00  0.971426180563
XY-9     Ready                  2020-02-11 11:12:13.138000-06:00  0.032648796215
XY-9     Open                   2020-02-14 05:42:06.763000-06:00  2.770759548991
XY-9     In progress            2020-02-15 01:31:51.244000-06:00  0.826209270861

As you can see, a new “partition” starts with each issue key, and the calculation restarts.

# Conclusion

You can use this approach to get a general idea about the time spent in each column in a Jira Kanban board.

However, remember that

When a measure becomes a target, it ceases to be a good measure.

Goodhart’s Law

and to always take metrics (especially ones involving human beings) with a grain of salt.

Written on March 1, 2020

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at samubalogh@gmail.com. Thank you!