The AWS Developers Podcast

Hero

Episode 214

AWS DevOps Agent: Can Your Pipeline Keep Up with AI?

Jun 24, 26 • 00:46:18

With Tipu Qureshi, Senior Principal Engineer at AWS

About this episode

Tipu Qureshi — Senior Principal Engineer at AWS — joins the show fresh from the AWS Summit NYC 2026 announcements to break down how DevOps Agent is changing the way teams handle operations and release management. After 14 years across EC2, Elastic Load Balancing, AWS Support, and Networking, Tipu moved into the Agentic AI organization to build the DevOps Agent and contribute to Agent Core. We explore how the agent investigates incidents autonomously, integrates with your IDE through Kiro and Claude, and validates code changes in sandboxes before they hit production. Key takeaways: • Reactive and proactive — DevOps Agent triggers on alarms and ServiceNow incidents, but Custom Agents now run on schedules to detect anomalies before they become outages. • Context is king — Customers who integrate their Git repos, metrics, and logs get significantly more accurate root causes. Native GitHub/GitHub Enterprise support plus bring-your-own MCP for custom observability. • IDE integration — Kiro powers and Claude plugins give on-call engineers the full agentic loop: investigate, root-cause, fix, and validate without leaving the editor. • Release management — The new readiness review inspects pipeline stages, past deployment failures, and integration tests to catch issues before merge, while sandbox testing validates proposed fixes. • Multi-cloud support — Native Azure integration via IDC with RBAC, plus bring-your-own MCP and A2A for on-premises and other clouds. • Custom agents and skills — Bring domain-specific knowledge (SAP HANA failure modes, proprietary tooling) via skills from GitHub repos or the assets API, with MCP tools for full customization. • A2A bi-directional — DevOps Agent can be engaged by other agents and can reach out to other agents, enabling multi-agent escalation workflows. • Transparency — Every tool call, skill invocation, and reasoning step is captured in a journal visible to customers via API and the operator console. • What's next — Deeper integrations, automated mitigation actions with safety policies, time-bound rules for agent escalation, and script execution coming soon.

Links

Here are the links to the tools, technologies, or articles we mentioned in this episode.