AI Triage Tool — an AI-powered diagnostics platform that cuts MTTR by automating Kubernetes log analysis, GitLab pipeline triage, runbook generation, and incident intelligence at enterprise scale.
Enterprise Kubernetes and GitLab pipeline environments generate millions of log lines and pipeline failure events. Manual triage takes hours. AI Triage changes that equation.
On-call engineers receive hundreds of alerts daily. Without intelligent prioritisation and root cause analysis, critical issues get buried in noise at 3 AM.
Diagnosing failures across Primary and Secondary clusters and GitLab pipelines requires navigating multiple kubectl contexts, Splunk dashboards, and Confluence wikis — costing 45+ minutes per incident.
Runbooks scatter across wikis and chat threads. AI Triage uses Claude AI to synthesise context from live pods, GitLab pipeline jobs, APM traces, and source code — generating actionable runbooks in seconds.
A unified platform connecting four user personas, three integration methods, and five data sources through a single AI-powered engine.
Zero-trust OAuth2 via Microsoft Entra ID. Every request is session-validated with 2-hour rolling cookies and bearer token introspection.
Machine-to-machine REST API. External applications authenticate with Entra ID OAuth Bearer tokens, then submit log payloads for AI analysis.
POST /api/analyze Authorization: Bearer <entra_token> Content-Type: application/json { "logs": "OOMKilled: container exceeded...", "context": { "cluster": "cluster-stg", "namespace": "ou01-dev", "pod": "api-server-7d9f" } }
{
"analysis": "Pod restarted due to OOM...",
"runbook": "1. Check mem limits\n2. ...",
"severity": "high",
"conversation_id": "conv_abc123"
}
GitLab pipeline failures automatically trigger log collection, AI analysis, and multi-channel runbook dispatch — no human pager required.
A single authentication layer spans Primary and Secondary Kubernetes clusters with namespace-aware RBAC across all OUs — OU-01 through OU-04.
"AI Triage Tool represents the next evolution of DevOps intelligence — where AI doesn't just assist engineers, it autonomously triages, diagnoses, and prescribes remediation at machine speed."