Over the last two decades, the evolution of IT has been more like something a revolution—in some cases, the transformation even unpinning real-life uprisings. Today we’re acutely aware of advances in technology and the internet, particularly as the global COVID-19 pandemic creates a sweeping reliance on tele-everything and online commerce to sustain companies’ livelihood.
The current surge in online traffic underscores the massive strides made across IT, ensuring critical support functions largely go uninterrupted—quietly impacting millions of lives. Behind the scenes, ITOps teams continue working to ensure the customer experience is resilient and incidents are addressed even before end users notice an issue. The irony of these seamless digital experiences, though, is that the delivery is built on escalating complexity. Providers seeking agility, speed and efficiency are increasingly adopting virtualization, microservices, containerization and cloud-based services. These capabilities often meet their designated needs, but the trade-offs can be costlier than expected.
IT Complexity Demands Smart, Machine Intelligence
The industry has rallied around AIOps, looking to AI and machine learning to scale to modern complexity. The goal is to evolve from human-driven monitoring and analysis to automated detection and remediation. The human inability to keep up with service alerts derived from machine-driven operations has created a gulf between machine speed and the traditional, event-correlation approach to troubleshooting.
The issue is clearly—and commonly—illustrated by the war room-style positioning assumed when a service degradation occurs and ITOps teams are besieged by demands for situation reports, impact assessments and projected resolution timelines. Amid a “sea of red,” when the operator has no idea which of the volumes of trouble tickets to work on first or where to focus attention, it’s undeniable that events correlation is wholly inadequate in navigating the modern-day minefields of data exhaust. It’s like diagnosing an illness solely by taking a patient’s temperature.
Applying advanced AI/ML will improve the thermometer, but it won’t fundamentally improve the patient’s diagnosis. Similarly, it’s much more promising to work through service interruptions by taking a holistic approach, tracking and connecting behavioral attributes and anomalies across the IT estate instead of solely relying on event data.
Except for the featured image, this story has not been edited by Javelynn and is published from a syndicated feed. Originally published on https://devops.com/are-we-nearing-the-end-of-it-service-outages/.