APIs: Ready for Prime Time?

Published: 04 October 2019

Author: Richard Thomas

The work we do at Highlight involves collecting information.

A lot of information, from networks all over the place.

In the time it took to write these two sentences, Highlight has collected 3,000,000 metrics from around 100,000 devices across 30,000 locations in 84 different countries around the world. A diverse population of WiFi Access points, WAN routers, network switches, Broadband connections and other technologies from a wide list of vendors. In another minute or so, the service will do the same again.

Historically, this has meant talking to each of those devices individually, using protocols like SNMP (the Simple Network Management Protocol). This isn’t a simple task: some of those devices might be next door in London and respond quickly, others in Australia or Hong Kong take a bit longer to reply. The devices might have been switched off. They might have been upgraded and now be running new software which responds differently, or no longer supports the query you're sending. So, every reply must be individually waited for, checked, and processed. 100,000 separate conversations with unpredictable targets across the planet, completed and repeated every few seconds.

Highlight to Device direct communication

But this approach has its advantages. It's a distributed model. One device can slow down or fail without affecting the others. The workload of replying to those queries is spread across multiple communications networks and across the processors in all those devices, each of which only needs to deal with a single query - a trivial load which won't interfere with its day job. The software which those processors run is relatively static, since the companies owning the devices have control over when things change. So, while the approach needs careful thought, we can design to work within it and things run smoothly.

This is IT, though, and change is the only constant. Increasingly, management of the devices in a network - its access points, routers, servers and switches - is being centralised. This means letting users manage a large IT estate from a single place: a controller. The controller takes requirements or requests from a user and makes them happen across all the devices.

Want to change a security parameter in each of your 900 wireless access points? Instead of logging in to each device to make the change – a task so error-prone and tedious that it will likely never get completed - you tell the controller what you want and it pushes the update to every device automatically (Engineers might be forgiven for thinking that this capability is blindingly obvious and is something the router and infrastructure industry should have created 20 years ago, and they’re right). You can even schedule the change to happen when you’re not in the office - if you are really confident that your change is valid and won’t simultaneously brick every node in your access layer.

Highlight to controller to device indirect communication

So far so good. However, and there is always a however when technology moves on, this approach also means that if you want information on those 900 devices you also have to go via the controller. Suddenly those 900 requests and responses are funnelled through a single network choke point and must be actioned by a single processor - which is likely already busy looking after the care and feeding of those 900 clients. To protect this processor the controller will start throttling requests, discarding those that exceed a limit. You can now check the status of some of your network, but not all of it. Hmm. And if the controller fails, even if it doesn't materially affect the client devices, you now can't talk to a single one of them and are effectively blind.

As engineers we tend to focus on the technical situation here, but it’s also worth looking up from the screen and realising that the business environment around APIs and controllers isn’t exactly helping the situation. SNMP has been around for 30 years and is a universal standard, but these controller-based systems are new, and evolving really, really fast. In the controller world each vendor has their own language, API format, and query set. Collection and monitoring services like Highlight need to be customised for each one, which in turn is hard because the platforms themselves are anything but stable.

These vendors are trying to grow business in a rapidly changing market, and they're bolting on functionality as quick as they can. The controller concept first really appeared with WiFi environments, but more recently software-defined networks (SDNs) have accelerated a gently competitive market into a feeding frenzy. Analysts are predicting that the market for SDNs, and SDWANs in particular, could be worth $30-40 billion in five years time; the 25 vendors currently in that market know there will be perhaps five of them left by then, and those five will be the ones which made the most noise and offered the most capability earliest. This does not have good implications for software stability.

Clearly, Management APIs won't get much attention or development time compared to a high-profile, sexy new piece of user-facing functionality, and when APIs do change it's often in a haphazard fashion. We've seen whole chunks of API functionality simply and literally disappear overnight without any warning or documentation changes, while a new software version is pushed out. And software changes do just happen, since many of these controllers are based in the cloud: new software is simply pushed to them by the manufacturer, who want all their users to be on the latest version because otherwise it’s harder to support. And that's how cloud works, of course.

Change happens, to paraphrase a common saying, and you'd better get used to it - a message that doesn't resonate well with enterprise customers who are used to having control over when changes are made to their infrastructure.

In short, in pursuit of flexibility we're swapping a consistent, distributed, controllable data-gathering method for a proprietary, bottlenecked, unpredictable one, with those responsible for it often busy elsewhere. There are a few bright stars, vendors who seem to have realised there's no point building a large network if you can't manage it properly (repeating the loop which the industry went round in the 1990s) but in most cases, scaling and stabilising management APIs is simply not a priority. It's going to be an interesting ride.

Share this article: