Shining a Literal Light on Software Builds

Published: 08 October 2019

Author: Martin Rowan

When you’re building complex software projects, the stability of your code base is critical to the efficiency of your engineering team,. At Highlight, we rely on local builds and testing to reduce the likelihood of errors reaching Trunk and disrupting other peoples’ work, but inevitably errors still slip through. When this happens, the whole team can be impacted if things are not resolved quickly.

Naturally, we have a notification system in place that sends emails and post a Slack messages when builds fail, but we found that these weren’t being checked frequently. How could we create more effective notifications?

The Goal

  • Ensure people are always aware of the build status
  • Get more timely resolution of build failures
  • Reduce time wasted and the frustration caused from broken builds

You’ll notice that reducing build breaks is not a goal, we are moving to Git soon and will be evaluating gated check-ins and other changes to our process in an attempt to address that particular problem.

Solving the problem

There are two parts to solving the problem. The first is restricting this awareness to builds which actually matter. To paraphrase George Orwell, all builds are important, but some are more crucial than others and we want to ensure we focus our attention in the right places. The second part is finding an effective but workable way to make it very clear something is wrong.

Today we have nine top-level TeamCity projects, some with sub-projects, with a total of 39 build jobs. This number continues to grow steadily as we split out more components. So, we identified which builds we cared about most by determining that there were entire projects we wanted to monitor, and other places where it was just one key build. For added complexity, the specific builds we wanted to monitor would change for each monthly release. Meaning the list of builds to monitor needs to be dynamic, not a simple fixed list. So solving the first part of our problem needs a little thought, and is something of a moving target, but isn’t too bad.

Making a noise - nicely

To create the actual alert I did briefly ponder the idea of an alarm sounding but in an open plan office I’m not sure that would have gone down too well. I also didn’t want to announce to the entire company every time someone made a mistake or a test failed.

So, I investigated if TeamCity had an API we could query, and from reading online it appeared to have a comprehensive API available. My initial attempts using Postman failed and I quickly discovered that the API access wasn’t enabled on our server. A quick chat with our resident TeamCity expert had the API enabled and pulling information directly from our server.

Knowing we could query the data we would need was enough for now. On to the hardware…

Hardware

Simple prototype

Like most projects you start with what you have to hand. Being a bit of a Raspberry Pi enthusiast* meant I happened to have a spare Raspberry Pi 3 B+ around along with a small 2" LED traffic light.

The prototype hardware was very simple:

Prototype Pi Light

With hardware assembled and ready to go, it was time to prototype the software. I initially thought of using Python, but having recently experimented with Node-RED, I thought this visual programming interface might make it easier for anyone to understand.

I set about using Node-RED to make API calls to TeamCity using the http request function:

Passing the data through various other nodes in the flow and making further API calls, as needed, before ultimately setting a value on the Raspberry Pi output node which controls the GPIO pins.

The prototype version mostly worked, at least enough to decide it was worth continuing. It was now time to think about how this would work in the office environment, a 2" LED light isn’t going to be seen by many.

Production Hardware

I’d seen an LED tower light from Adafruit on ThePiHut.com but at the time it was out of stock and couldn’t find an alternative source.

If I recall they had the simpler version with just a red light but having played with my little traffic light I had ideas on how I was going to use the other colours.

After seeing the Adafruit tower light my heart was set on that form. My online search turned up Patlite as a major maker of these signal tower designs, often used in industrial and commercial environments. The price at mainstream suppliers was 4x the Adafruit light, but I was lucky to pick up a bargain on a mainstream auction site.

This Patlite happened to have four colours (red, amber, green and blue), I was sure I could do something with the extra light. With the tower light stick now secured, it was time to work out how to interface a set of 24 Volt LED lights with a 5 Volt Raspberry Pi.

The Hardware Interface

The simplest solution, for anyone looking to follow in my footsteps, would be to use an Automation HAT and the Node-RED library for the Automation HAT. This HAT provides 0-24V buffered sinking outputs along with three relay-controlled outputs, either could be used to control 3x 24V lights or 6x if all used together. Even the smaller Automation pHAT has four suitable outputs if you combine the relay and buffered outputs.

For reasons I now can’t quite recall, I decided to create my own basic circuit using an ULN2003 to allow the 5V logic of the Raspberry Pi to control the 24V LED lights on the Patlite tower. If I were to take this path again, I’d have added a step-down voltage circuit to the custom HAT so all of it could have been powered from a single 24V power supply.

Assembled Pi Light Controller

The Running Software

Node-RED Flow
  • Start-up:
    • Make initial call to TeamCity for a list of projects
    • Define a callback flow for making async requests for project status. When requests are made the status code is checked, if 200 then the result is returned. If not, then this triggers a refresh of the Project List and a call to the error handler.
  • Every 5 minutes:
    • Refresh the projects to find new TeamCity projects and expel old ones.
  • Every 1 minute:
    • Get the status of each of the projects in the project list. This is done though an async call to the HTTP callback function.
    • As there can be many projects being polled, each of which could trigger a re-evaluation of the lights, some throttling is introduced to help group up the results such they are delayed if more results arrive within 1 second of each other.
  • The project list is built througha series of calls:
    • GetProjectCount – To determine if the project has sub projects.
    • QuerySubProject is called if there are sub projects, for each of these an updated query URL is pushed back into the start to recursively work its way down the tree of projects to find all the builds.
    • If there aren’t any subprojects, we can create a filtered list of projects of interest, by only adding to the project list, builds or entire projects we care about. The status URL for a project vs a build type is different, so the correct type is added as appropriate for the type of item we want to monitor.
    • This project list and its matching set of URLs is processed every minute and the dictionary updated with the latest status.
  • The Light Selector controls the physical lights via the GPIO pins along with providing a dashboard via node-red-dashboard. The physical lights are always disabled outside working hours.
    • All the projects in the list being polled are of interest, we decided that some projects/builds are critical and will trigger the Red light, others are slightly less important and will trigger the Amber light. Only if all the projects are passing will the Green light be illuminated.
  • The HTTP Error Handler and the General Exception Handler, log errors and triggers the blue light to flash.

Outcome

The team are now always aware of the build status and significantly more responsive to build failures. So, the objective I initially set has been met. In a recent retrospective I asked if the team wanted to keep the light, and the response was a unanimous “yes”. A simple visual piece of real-time information can be a real benefit to keeping a code base stable it turns out. Which is a relief, since at Highlight the whole company is based around the idea that providing clear, uncluttered information to people helps them do their jobs better. So a happy ending all round.

* you can read my personal blog which seems to almost entirely focus on things related to the Raspberry Pi in one way or another.

Share this article: