Background
Västra Götalandsregionen is the largest organization in Sweden by number of employees. Its IT department provides
services to 55,000 employees working in hospitals, schools and cultural development across the region.
Challenge
In an ever-changing IT environment, monitoring systems are crucial for understanding the overall health. Equally important is having processes in place to review and maintain these systems. Ensuring they are up-to-date is essential and requires significant investment of time and resources.
Kent Holmblad, Technical Specialist at VGR IT: “It’s a full time job to add and remove hosts and services just to keep up to speed. Without consistently reviewing and updating your system, false alerts could occur more frequently, resulting in a bad user experience and low overall trust in the monitoring platform’s ability to identify what’s going on.”
The solution? Automation.
Kent said this 2 years ago. This was a start of a journey to automate all configuration management for VGR’s monitoring system ITRS OP5 Monitor.
VGR IT had been using ITRS OP5 Monitor since 2008, with huge success, monitoring well over 4,500 network elements and 50,000 services. But in an ever changing IT environment, regardless of how good or easy to use the monitoring system is, ongoing maintenance take time. Opsdis got involved to automate these processes, which previously had been carried out at VGR IT largely manually.
There was already a good foundation in place. All monitored network elements at VGR IT are managed by Cisco Prime; this combined with ITRS OP5 Monitor Rest API meant a fully automated platform was in scope.
Solution breakdown
This was the first step to the tool we now call Mender.
Mender is a python library that include the following major components:
- A provider module that communicates with a source system, e.g Cisco Prime, to read the data that will be used to create the Monitor configuration.
- A module that communicates with ITRS OP5 Monitor to read the existing configuration and to write back new configuration depending on changes in the source system,
- A logic engine that
- Calculates the relations between hosts in source system for use in parent / child relations to
prevent alert storms in case of outages. - Compares the data from source system and ITRS OP5 Monitor to find changes, enabling
efficient and incremental updates.
- Calculates the relations between hosts in source system for use in parent / child relations to
With the above, a fully automated platform was put in place, ensuring that VGR IT’s monitoring system was up-to-date in real-time. Kent Holmblad added: “Not only have we removed the need for manually updating the configuration, we are now confident that what’s in our monitoring system correlates with the real world. It’s a big step forward for us”.
Today, 3 years later, VGR has continued with using Mender for a multitude of systems that provide Monitor with configuration data.
- Cisco ACI – network system for data centers
- Infoblox – a system for DNS, DHCP and IP address management
- Puppet – integration with Puppetdb to configure monitoring for servers and infrastructure software
- AMQ – integration with AMQ management system to configure monitoring for queues and topics
- An inhouse telephony inventory system
- An inhouse firewall inventory system
And in the short term integration with F5 BIG-IQ management system to create monitor configuration for all BIP-IP equipment.
Jan-Åke Hovenäs, line manager at VGR IT commented: “The Mender solution that Opsdis helped us with in combination with ITRS OP5 Monitor gives us the monitoring we need with a close to zero maintenance cost. It has really improved how we do monitoring here. We can now spend more time improving the way we monitor instead”
But VGR are not alone. Currently we have 11 customers using the Mender solution with a variety of configuration integrations like:
And we are continuously adding to this list.
“When we were asked by VGR IT to automate the maintenance of their ITRS OP5
Monitor system we were excited. We have a strong conviction that all repetitive
tasks done by humans should be automated. Working with VGR IT, we had the
potential to provide them with huge time savings and much more”.
Anders Håål,
CTO at Opsdis.