About Me
12+ years in Software Engineering. Now building Agentic AI Solutions at Amazon, where I lead large-scale distributed systems, resiliency, and agentic AI systems.
At Amazon, I work on large-scale distributed systems that need to be both highly resilient and increasingly intelligent. Over the past year that's meant leading the design and delivery of agentic AI systems — multi-step, tool-using workflows that operate reliably at Amazon scale, where a flaky abstraction isn't a demo problem, it's a production incident.
I spend most of my time at the intersection of AI research and engineering — reading papers, running experiments, and turning interesting ideas into tools that real people can use. My particular obsession lately has been large-scale agentic AI systems: how to make them reliable, fast, and genuinely useful rather than just impressive in demos.
Before diving deep into AI, I spent over a decade building and scaling backend systems across the stack — distributed databases, event-driven architectures, service mesh, observability. That engineering foundation shapes how I think about AI: reliability, latency, and failure modes matter just as much as model quality.
On this blog I write about what I'm learning and building: agentic AI design patterns, cost optimization for LLM systems, lessons from running AI in production, and the engineering details that most AI content skips. I try to write the posts I wish existed when I was figuring something out.