Getting Resilient and AI-Ready to Thrive Under Pressure: Vibhav Chary, FourKites, Inc

Bengaluru, 9th October 2025: In the fast, paced world of SRE, DevSecOps, and platform engineering, where uptime, innovation, and resilience are constantly at odds, few leaders have managed to harmonize them as effectively. With over two decades of experience across Ola, Niyo, CSS Corp, and now FourKites, he has built a career on balancing reliability with innovation, transforming high, pressure outages into lessons in psychological safety, and turning failures into a powerful “anti, pattern library” for future success.

Join Mr. Vibhav Chary, VP Engineering at FourKites, Inc in an interesting conversation with Mr. Marquis Fernandes who spearheads the India Business at Quantic India, in this conversation, Vibhav shares hard, won insights on building automation, first systems, embracing AI, driven operations, and leading teams with principles that deliver both stability and speed.

You’ve consistently maintained 99.99% infra uptime with minimal vulnerabilities. What’s your philosophy when balancing uptime vs innovation in a rapidly evolving SRE/DevSecOps environment?

Looking at my 20, year journey, I’ve learned that uptime and innovation aren’t opposing forces, they’re synergistic when approached strategically. Here’s how I balance them:

Automation as the Foundation: At Niyo, I successfully managed an entire bank account with read, only AWS console access through GitOps. This wasn’t just about security, it was about creating predictable, automated systems that allow for rapid innovation without compromising stability. When you automate your infrastructure provisioning (like I did with Terraform for Day Zero deployments), you create consistent, repeatable environments where innovation can happen safely.
Observability, Driven InnovationMy approach to maintaining 99.99% uptime while enabling rapid change relies heavily on comprehensive observability. At both Ola and Niyo, I built end, to, end observability platforms that gave us MTTD of 5 minutes and MTTR of 30 minutes. This isn’t just monitoring, it’s creating a safety net that lets teams innovate confidently. When you can detect and resolve issues in minutes rather than hours, you can afford to take calculated risks on new technologies.

Agentic workflows and AI, driven infrastructure are gaining traction. How do you see the SRE or DevOps engineer’s role evolving in an AI, native ecosystem?

Based on my experience and current focus at FourKites on AI/Agentic/MCP workflows, I see the SRE/DevOps engineer’s role undergoing a fundamental transformation, not replacement, but elevation to a more strategic position.

From Manual Operators to AI Orchestrators

The traditional “toil reduction” goal of SRE is being accelerated exponentially. Where I previously automated server patching and certificate renewals through AWS Systems Manager and CI/CD pipelines, we’re now moving toward AI agents that can autonomously handle complex operational workflows. At FourKites, I’m working on integrating Model Context Protocol (MCP) with our operational frameworks, this means AI agents can now understand context across our entire infrastructure stack and make intelligent decisions.

From incident management to platform reliability, how do you build psychological safety and rapid decision, making in high, pressure outage or rollback scenarios?

This is where my hardest, earned lessons come from. Over 20 years, I’ve been through countless high, pressure situations, from reducing outages at Ola from 7 per month to almost 1, to managing major incidents across multiple datacenters. Building psychological safety during outages isn’t just about being nice, it’s about creating an environment where people make better technical decisions under extreme pressure.

The Foundation: Normalize the Abnormal

At CSS Corp, when I improved post, mortem documentation from 30% to 100%, the real breakthrough wasn’t better templates, it was changing the conversation from “who caused this?” to “what can we learn?” I established what I call “blameless curiosity” as the default mindset.

You’ve led high, impact teams and solved complex infra puzzles, but what’s the most memorable project that taught you more than success ever did?

Over 20 years, I’ve built what I call my “anti, pattern library”, a collection of “never do this again” lessons that are honestly more valuable than any success story.

What Should NOT Be Done:

Migration Anti, Patterns:

Never migrate databases during business hours “because it should be quick”
Never skip the “rollback test” because “we’re confident this will work

Team Management Anti, Patterns:

Never assume people understand the “why” behind your technical choices

Infrastructure Anti, Patterns:

Never build monitoring after you build the system, instrument first, optimize later
Don’t treat security as something you add later (my PCI compliance experience taught me this painfully)

Every project failure mode became a team strength. That’s the real education the fast lane provides, not success stories, but a comprehensive understanding of how complex systems fail and how to build teams that learn faster from those failures than competitors do.

From OLA to Niyo to FourKites, what’s one unexpected life lesson the fast lane of tech leadership taught you that no certification ever could?

The 30,000, Foot Problem

You’re absolutely right, certifications teach you the “what” and the “how” at a high level, but they can’t teach you the “why is this specific combination of infrastructure and application behaviour happening right now?” That granular understanding only comes from being in the trenches during critical moments.

My Version of This Lesson

For me, the deepest lesson was: You don’t truly understand a system until you’ve seen it break in unexpected ways and had to rebuild that understanding from first principles while the business is breathing down your neck.

Can you share some leadership habits or principles you’ve embedded in your team that have consistently delivered strong results?

Core Leadership Principles I’ve Embedded:

Accountability Through Safety

Ownership matrices with backup owners for every system
Public commitments in sprint planning for mutual support
Celebrate accountability during failures, not just successes

Reliable Delivery

70% Rule: Add 30% buffer to all estimates automatically
Daily risk surfacing: “What could prevent our commitments?”

Smart Escalation

No progress = escalate immediately
Celebrate early escalation that prevents major issues

Automation by Default

Three, touch rule: Manual three times = automate on fourth
Track manual processes like technical debt

Corner Case Thinking

Designated devil’s advocate in technical discussions
“What happens when everything goes wrong simultaneously?”

Security, First Mindset

“How could this be exploited?” in every design review

Measurement, Driven Culture

Every team member owns specific metrics
Monthly trend analysis sessions
Focus on few critical metrics vs. dashboard sprawl

Vibhav’s journey underscores a powerful truth, that resilience, innovation, and leadership in SRE and DevSecOps are not about chasing perfection, but about building systems, teams, and cultures that learn, adapt, and thrive under pressure. From automation, first strategies and AI, driven orchestration to blameless post, mortems and principle, led leadership, his playbook offers lessons that extend far beyond infrastructure. As the industry moves deeper into the AI, native era, and his insights remind us that the real edge lies not just in technology, but in how leaders balance reliability with innovation, and failures with growth.

[blog_bottom_ad]

Inside India’s Digital-First Revolution: Highlights from the 3rd Annual NBFC & FinTech Excellence Awards 2026 in Mumbai

Agentic AI Takes Center Stage at Quantic’s 10th Edition India DevOps Show 2026

Five Security Must-Dos Before You Even Write a Line of Code: Sulabh Jain, Amazon, Asia/Pacific

Quantic Brings DevOps Magic to Chennai – Vanakkam Innovation!

Quantic India witnesses the success of 2nd Edition Cloud Cost Optimization & FinOps Show 2025 : Enterprise- Grade FinOps: Building Scalable, Predictable, And Accountable Cloud Strategies

Quantic India witnesses the success of OT Nexus 2025: Bridging IT-OT: The Future of Smart Manufacturing

Quantic India witnesses the success of 4th Edition Cybersecurity Excellence Awards 2025 : Defend, Adapt and Evolve

The Art of Balancing: Leadership, Cybersecurity, and Personal Growth with Rishabh Chhajer

Strategies, Sports, and Stories: Lessons from Mr. Ashok Kumar’s Cybersecurity Journey

Cricket, Chanakya, and Career Choices: Exploring Life Lessons with Mr. Rijo Yohannan

Quantic India witnesses the success of the 7th Edition of the CX Excellence Awards 2025

Human-First Leadership for a Customer-Obsessed World

The Human Blueprint for CX Excellence.

More than Marketing: A Journey Fueled by Curiosity & Clarity

The Power of Persistence in a Fast-Changing CX World

Celebrating the Pioneers of Change: BFSI Technology Excellence Awards 2024

Quantic India’s Outstanding Achievement at the 4th Annual BFSI Excellence Awards 2023: Pioneering the Digital Transformation Landscape

Navigating the Crypto Investment Landscape: Expert Advice from Edul Patel, CEO & Co-Founder of Mudrex

Understanding the revolution created by API first architecture in BFSI

Analyzing Risk: Everything You Need to Know for Identifying Unseen Problems!

Revolutionize your Business operations with effective Supply Chain Management

“Discover the Best of Logistics & Warehouse Innovations at the 3rd Annual Warehouse & Logistics Excellence Awards”

“Stay Ahead of the Curve with Warehouse & Logistics Innovations at the 3rd Annual Warehouse & Logistics Excellence Awards”

The 3rd Annual Warehouse & Logistics Excellence awards offered opportunities to discover more about logistics and warehousing.

The 3rd Annual Warehouse & Logistics Excellence Awards revelled the success stories of Supply Chain Leaders.

Getting Resilient and AI-Ready to Thrive Under Pressure: Vibhav Chary, FourKites, Inc

Inside India’s Digital-First Revolution: Highlights from the 3rd Annual NBFC & FinTech Excellence Awards 2026 in Mumbai

Agentic AI Takes Center Stage at Quantic’s 10th Edition India DevOps Show 2026

Quantic India witnesses the success of the 7th Edition of the CX Excellence Awards 2025

Getting Resilient and AI-Ready to Thrive Under Pressure: Vibhav Chary, FourKites, Inc

Related Posts