AI-powered IT support automation with Amazon Bedrock & Agentic Workflows

Queueup transformed its IT support operations with an AI-powered, agentic automation platform built on AWS. By combining Amazon Bedrock with semantic search and durable workflows, the organization now resolves 42% of tickets automatically and reduced average handling time by 88%, without increasing headcount.

Right Banner size (20)

The Challenge

As Queueup’s customer base expanded, its L1 support team faced increasing operational pressure and scalability constraints.

 

Key challenges included:

  • Manual triage overload: Over 60% of L1 support time was spent on ticket classification and routing, equivalent to more than 12 full-time employees annually
  • Slow resolution times: Common issues required 4+ hours on average to resolve
  • Fragmented knowledge: Critical solutions were scattered across Confluence, Slack, email, and individual engineers’ expertise
  • Minimal knowledge reuse: Less than 5% of historical resolutions were leveraged for new tickets
  • High turnover (40% annually): Driven by burnout from repetitive manual tasks
  • Scalability limitations: 80–120 tickets per day without the ability to scale without proportional headcount growth
  • Inconsistent resolution quality: Outcomes varied depending on engineer availability and experience

Queueup required an intelligent automation solution capable of learning from historical resolutions, delivering consistent high-quality responses, and scaling efficiently without linear cost increases.

The Solution

CloudNation implemented an enterprise-grade agentic AI platform on AWS that autonomously processes and resolves IT support tickets through a structured six-step workflow.

 

Core architecture

The solution combines fully managed AWS services with durable workflow orchestration:

  • Amazon Bedrock (Claude 3.5 Sonnet)
    Multi-step reasoning engine for structured problem extraction and solution validation, using EU inference profiles to meet compliance requirements
  • Amazon Titan Text Embeddings V1
    1,536-dimensional embeddings enabling cost-efficient semantic search
  • Amazon OpenSearch Service (k-NN)
    Vector database supporting semantic similarity search across 10,000+ historical resolutions
  • Temporal.io
    Deterministic workflow orchestration with built-in human-in-the-loop capabilities
  • AWS ECS Fargate
    Serverless container platform running four microservices (Jira webhook, Jira worker, Confluence worker, data wrangling service)
  • Amazon Aurora PostgreSQL (Multi-AZ)
    Persistence layer for workflow state management
  • Secure VPC Architecture
    Private subnets with VPC endpoints for Bedrock, S3, Secrets Manager, and Textract, eliminating NAT Gateway costs

Infrastructure was deployed using layered Terraform modules (000–030) and GitOps CI/CD via GitHub Actions with AWS OIDC.

 

 

The six-step agentic workflow

Each ticket is processed autonomously through a structured reasoning pipeline:

  1. Preprocess
    Consolidates ticket description, comments, attachments, and metadata into a unified context
  2. Extract problem
    Claude analyzes the context and extracts the core technical issue using structured reasoning
  3. Search knowledge base
    RAG-powered semantic search across historical resolutions using Titan embeddings and OpenSearch
  4. Evaluate solution
    Multi-factor confidence scoring assessing relevance, similarity, and contextual alignment
  5. Post to Jira
    Automatically generates a formatted resolution comment with citations to source documentation
  6. Transition status
    • 70% confidence → automatically resolve

    • 40–70% confidence → provide AI-assisted suggestion
    • <40% confidence → escalate to human support

 

 

Results within 90 days

The implementation exceeded all predefined KPIs and delivered measurable business value shortly after going live.

 

Operational impact

  • 42% fully automated ticket resolution (target: 40%)
  • 31% AI-assisted recommendations
  • 88% reduction in handling time (4.2 hours → 28 minutes)
  • 87% knowledge reuse rate (up from <5%)

Business impact

  • Avoided additional L1 support hires despite growing ticket volume
  • Increased first-contact resolution
  • Expanded support capacity without increasing headcount

Technical performance

  • Zero hallucination incidents in production following implementation of multi-factor confidence scoring and conservative thresholds
  • System handles 3× peak business-hour load with auto-scaling
  • 15 CloudWatch alarms monitoring latency, error rates, and cost

 

 


 

Total cost of ownership (3-year analysis)

A comprehensive three-year TCO analysis compared AWS infrastructure costs against realized business value.

 

Cost optimization strategies

  • VPC Endpoints: 50–80% reduction in data transfer costs; S3 Gateway Endpoint is free
  • ECS right-sizing: 40% compute savings through iterative load testing
  • Bedrock serverless model: $350–$750 per month savings compared to self-managed SageMaker endpoints
  • Scale-to-Zero services: Idle workloads automatically scale to zero, eliminating unnecessary spend

The serverless-first architecture ensures costs scale proportionally with usage.

 

 

Lessons learned

 

Strategic technology decisions

  • Bedrock vs. Self-Hosted LLMs
    Eliminated ML infrastructure management overhead while maintaining compliance and cost flexibility
  • OpenSearch vs. Qdrant
    Reduced operational complexity with managed service, IAM integration, and combined full-text and vector search
  • ECS Fargate vs. EKS
    Approximately 50% cost savings for burst-heavy workloads due to pay-per-task-second pricing

Critical success factors

  • Conservative confidence thresholds (initially 80%, gradually lowered to 70%)
  • Human-in-the-loop design to maintain quality and build internal trust
  • Comprehensive observability and monitoring from day one

Best practices demonstrated

  • Responsible AI with full audit trails via Temporal
  • Zero static credentials (IAM roles + OIDC)
  • Private networking and least-privilege IAM policies
  • 100% Infrastructure as Code (Terraform)
  • Extensive automated testing (100+ unit tests)

 

Recommendations for future implementations

  • Begin with a pilot focused on low-risk ticket categories
  • Implement cost monitoring and alerting early, LLM context expansion can significantly impact spend
  • Combine AI autonomy with structured human oversight

 

CloudNation-beeld-34-1
AI support automation

Explore what AI automation could do for your support team

Schedule an AI automation assessment

More success stories