AI-powered IT support automation with Amazon Bedrock & Agentic Workflows
Queueup transformed its IT support operations with an AI-powered, agentic automation platform built on AWS. By combining Amazon Bedrock with semantic search and durable workflows, the organization now resolves 42% of tickets automatically and reduced average handling time by 88%, without increasing headcount.
The Challenge
As Queueup’s customer base expanded, its L1 support team faced increasing operational pressure and scalability constraints.
Key challenges included:
- Manual triage overload: Over 60% of L1 support time was spent on ticket classification and routing, equivalent to more than 12 full-time employees annually
- Slow resolution times: Common issues required 4+ hours on average to resolve
- Fragmented knowledge: Critical solutions were scattered across Confluence, Slack, email, and individual engineers’ expertise
- Minimal knowledge reuse: Less than 5% of historical resolutions were leveraged for new tickets
- High turnover (40% annually): Driven by burnout from repetitive manual tasks
- Scalability limitations: 80–120 tickets per day without the ability to scale without proportional headcount growth
- Inconsistent resolution quality: Outcomes varied depending on engineer availability and experience
Queueup required an intelligent automation solution capable of learning from historical resolutions, delivering consistent high-quality responses, and scaling efficiently without linear cost increases.
The Solution
CloudNation implemented an enterprise-grade agentic AI platform on AWS that autonomously processes and resolves IT support tickets through a structured six-step workflow.
Core architecture
The solution combines fully managed AWS services with durable workflow orchestration:
- Amazon Bedrock (Claude 3.5 Sonnet)
Multi-step reasoning engine for structured problem extraction and solution validation, using EU inference profiles to meet compliance requirements - Amazon Titan Text Embeddings V1
1,536-dimensional embeddings enabling cost-efficient semantic search - Amazon OpenSearch Service (k-NN)
Vector database supporting semantic similarity search across 10,000+ historical resolutions - Temporal.io
Deterministic workflow orchestration with built-in human-in-the-loop capabilities - AWS ECS Fargate
Serverless container platform running four microservices (Jira webhook, Jira worker, Confluence worker, data wrangling service) - Amazon Aurora PostgreSQL (Multi-AZ)
Persistence layer for workflow state management - Secure VPC Architecture
Private subnets with VPC endpoints for Bedrock, S3, Secrets Manager, and Textract, eliminating NAT Gateway costs
Infrastructure was deployed using layered Terraform modules (000–030) and GitOps CI/CD via GitHub Actions with AWS OIDC.
The six-step agentic workflow
Each ticket is processed autonomously through a structured reasoning pipeline:
- Preprocess
Consolidates ticket description, comments, attachments, and metadata into a unified context - Extract problem
Claude analyzes the context and extracts the core technical issue using structured reasoning - Search knowledge base
RAG-powered semantic search across historical resolutions using Titan embeddings and OpenSearch - Evaluate solution
Multi-factor confidence scoring assessing relevance, similarity, and contextual alignment - Post to Jira
Automatically generates a formatted resolution comment with citations to source documentation - Transition status
-
70% confidence → automatically resolve
- 40–70% confidence → provide AI-assisted suggestion
- <40% confidence → escalate to human support
-
Results within 90 days
The implementation exceeded all predefined KPIs and delivered measurable business value shortly after going live.
Operational impact
- 42% fully automated ticket resolution (target: 40%)
- 31% AI-assisted recommendations
- 88% reduction in handling time (4.2 hours → 28 minutes)
- 87% knowledge reuse rate (up from <5%)
Business impact
- Avoided additional L1 support hires despite growing ticket volume
- Increased first-contact resolution
- Expanded support capacity without increasing headcount
Technical performance
- Zero hallucination incidents in production following implementation of multi-factor confidence scoring and conservative thresholds
- System handles 3× peak business-hour load with auto-scaling
- 15 CloudWatch alarms monitoring latency, error rates, and cost
Total cost of ownership (3-year analysis)
A comprehensive three-year TCO analysis compared AWS infrastructure costs against realized business value.
Cost optimization strategies
- VPC Endpoints: 50–80% reduction in data transfer costs; S3 Gateway Endpoint is free
- ECS right-sizing: 40% compute savings through iterative load testing
- Bedrock serverless model: $350–$750 per month savings compared to self-managed SageMaker endpoints
- Scale-to-Zero services: Idle workloads automatically scale to zero, eliminating unnecessary spend
The serverless-first architecture ensures costs scale proportionally with usage.
Lessons learned
Strategic technology decisions
- Bedrock vs. Self-Hosted LLMs
Eliminated ML infrastructure management overhead while maintaining compliance and cost flexibility - OpenSearch vs. Qdrant
Reduced operational complexity with managed service, IAM integration, and combined full-text and vector search - ECS Fargate vs. EKS
Approximately 50% cost savings for burst-heavy workloads due to pay-per-task-second pricing
Critical success factors
- Conservative confidence thresholds (initially 80%, gradually lowered to 70%)
- Human-in-the-loop design to maintain quality and build internal trust
- Comprehensive observability and monitoring from day one
Best practices demonstrated
- Responsible AI with full audit trails via Temporal
- Zero static credentials (IAM roles + OIDC)
- Private networking and least-privilege IAM policies
- 100% Infrastructure as Code (Terraform)
- Extensive automated testing (100+ unit tests)
Recommendations for future implementations
- Begin with a pilot focused on low-risk ticket categories
- Implement cost monitoring and alerting early, LLM context expansion can significantly impact spend
- Combine AI autonomy with structured human oversight