Over the last two decades, the evolution of IT has been more like something a revolution—in some cases, the transformation even unpinning real-life uprisings. Today we’re acutely aware of advances in technology and the internet, particularly as the global COVID-19 pandemic creates a sweeping reliance on tele-everything and online commerce to sustain companies’ livelihood.
The current surge in online traffic underscores the massive strides made across IT, ensuring critical support functions largely go uninterrupted—quietly impacting millions of lives. Behind the scenes, ITOps teams continue working to ensure the customer experience is resilient and incidents are addressed even before end users notice an issue. The irony of these seamless digital experiences, though, is that the delivery is built on escalating complexity. Providers seeking agility, speed and efficiency are increasingly adopting virtualization, microservices, containerization and cloud-based services. These capabilities often meet their designated needs, but the trade-offs can be costlier than expected.
IT Complexity Demands Smart, Machine Intelligence
The industry has rallied around AIOps, looking to AI and machine learning to scale to modern complexity. The goal is to evolve from human-driven monitoring and analysis to automated detection and remediation. The human inability to keep up with service alerts derived from machine-driven operations has created a gulf between machine speed and the traditional, event-correlation approach to troubleshooting.
The issue is clearly—and commonly—illustrated by the war room-style positioning assumed when a service degradation occurs and ITOps teams are besieged by demands for situation reports, impact assessments and projected resolution timelines. Amid a “sea of red,” when the operator has no idea which of the volumes of trouble tickets to work on first or where to focus attention, it’s undeniable that events correlation is wholly inadequate in navigating the modern-day minefields of data exhaust. It’s like diagnosing an illness solely by taking a patient’s temperature.
Applying advanced AI/ML will improve the thermometer, but it won’t fundamentally improve the patient’s diagnosis. Similarly, it’s much more promising to work through service interruptions by taking a holistic approach, tracking and connecting behavioral attributes and anomalies across the IT estate instead of solely relying on event data.
Behavioral correlation provides a deeper, more contextual understanding of what’s happening at the service level, starting with dynamic baselines of “normal” activity levels upon which ML is applied to detect and flag anomalies captured across a broad variety of real-time data. With an aggregated, service-level topology, it’s much easier to assess service health through availability and risk, so that decision-makers can prioritize what’s most pressing and visualize potential business impact.
Traversing New Realities: Technology as a Front-Line Offense
As the global COVID-19 pandemic continues to transform how people live and work, it’s clear that technology is the most common and often the most important tool in moving forward. By now most of the workforce has straightened out the initial kinks in telework and figured out general best practices. With some sense of stability now in place, leaders are shifting their focus to ensuring business continuity and instituting the necessary infrastructure to sustain new operational models indefinitely.
In organizations where digital transformation is already underway, adopting new tools and protocols will benefit from foundational support. Companies and agencies already making strides in harnessing IT to boost operations and outcomes will already be on their way to achieving better visibility and improved efficiency. Implementing behavioral correlation will, in those cases, apply the power of ML to aid in faster root-cause analysis and resolution—and a better overall customer experience.
That’s not to say that organizations less mature in their digital transformation won’t also benefit from eschewing the ineffective methodologies of event correlation and reactive ITOps. By evolving beyond legacy systems of piecemeal products and services, IT teams instead can leapfrog ahead to sophisticated analytics, data synthesis and comprehensive modeling that monitors for and detects anomalous activity—correlating broader behaviors, not just events. By jettisoning the focus on events and instead incorporating behavioral correlation into IT service, troubleshooting and remediation become fluid.
Amid seismic shifts in the demands on IT services, it’s more crucial than ever to deliver reliable services and capabilities. That means getting to the root causes of problems quickly and resolving them faster—a feat that’s now a reality through the integration of service metrics and IT automation.
The world of IT has changed dramatically over the past 10 years, and especially in the past few months. Traditional processes and tools are inadequate to manage the speed and complexity of the modern IT environment. A new era is arriving, and not a moment too soon.
Except for the featured image, this story has not been edited by Javelynn and is published from a syndicated feed. Originally published on https://devops.com/are-we-nearing-the-end-of-it-service-outages/.