Building an AI Call Center: Architecture
The Pitch
What if you could spin up a voice agent that answers phone calls, has a real conversation with the caller, and knows when to escalate? Not a press-1-for-sales IVR — an actual AI that listens, reasons, uses tools, and speaks back naturally.
That's what I set out to build. The result is a platform that wires together Amazon Connect, Amazon Bedrock, and a management dashboard — all deployed with CDK and TypeScript.
This is Part 1 of a three-part series. Here I'll cover the architecture and infrastructure design. Part 2 dives into the conversation engine, and Part 3 walks through the dashboard.
High-Level Architecture
The platform splits into four CDK stacks per stage:
| Stack | Region | What it does |
|---|---|---|
| CertificateStack | us-east-1 | DNS zone, NS delegation, ACM wildcard cert |
| AiStack | eu-central-1 | DynamoDB, API Gateway, 4 Lambdas, EventBridge, Cognito |
| ConnectStack | eu-central-1 | Connect instance, Lex bot, contact flow, queue |
| DashboardStack | eu-central-1 | CloudFront + Lambda SSR for the management UI |
The stacks have deliberate dependency ordering: ConnectStack depends on AiStack (it needs the AI Lambda ARN for the contact flow), and DashboardStack depends on both AiStack (for the API URL and Cognito config) and CertificateStack (for the certificate and zone).
The Call Path
When someone dials in, the call follows this path:
- PSTN → Amazon Connect — Connect handles the telephony (SIP, media, queueing)
- Connect → AI Conversation Lambda — the contact flow invokes a Lambda on each caller utterance
- Lambda → Amazon Bedrock — the Lambda sends conversation history to Claude for a response
- Lambda → EventBridge — each turn publishes an event for downstream consumers
- Bedrock → Lambda → Connect — the response text goes back to Connect, which speaks it via Polly
On disconnect, Connect fires a native EventBridge event that triggers the Analytics Lambda to finalize the call record.
Why a Single DynamoDB Table
I chose a single-table design because the access patterns are well-defined and the entities are related. Here's the key structure:
| pk | sk | Entity |
|---|---|---|
TENANT#<id> | AGENT#<id> | Agent configuration |
TENANT#<id> | CONFIG | Tenant settings |
TENANT#<id> | WEBHOOK#<id> | Webhook endpoint |
CALL#<contactId> | CONVERSATION | Conversation history |
CALL#<contactId> | OUTCOME#<agentId> | Call outcome |
PHONE#<number> | MAPPING | Phone number mapping |
A GSI (gsi1pk / gsi1sk) supports time-ordered queries for call history:
| gsi1pk | gsi1sk | Use case |
|---|---|---|
TENANT#<id> | OUTCOMETIME#<iso> | "Show me all calls for this tenant, newest first" |
The beauty of this design is that every Lambda handler shares the same table and the same key-building functions. No cross-table joins, no fan-out reads.
Key Builder Pattern
Early on, every Lambda was littered with string concatenation:
const key = {
pk: `${TENANT_PREFIX}${tenantId}`,
sk: `${AGENT_PREFIX}${agentId}`,
};This got repetitive fast. I extracted key builders into the shared package:
import { tenantPk, agentSk } from '@dawalnut/telephony-shared';
const key = { pk: tenantPk(tenantId), sk: agentSk(agentId) };Eight functions, zero magic. They just concatenate a prefix with an ID. But they eliminate an entire class of typo bugs and make the DynamoDB operations scannable at a glance.
Event-Driven Side Effects
The platform uses EventBridge as its nervous system. The AI Lambda publishes two event types:
telephony.agent.turn— fired after each conversation turntelephony.agent.outcome— fired when the agent resolves or escalates
Three rules route these events:
- Disconnect rule — listens for Connect's native
DISCONNECTEDevent → Analytics Lambda - Outcome rule — listens for
telephony.agent.outcome→ Analytics Lambda - Webhook rule — listens for all
dawalnut.telephonyevents → Webhook Delivery Lambda
The Webhook Delivery Lambda loads the tenant's webhook configuration from DynamoDB and dispatches to each endpoint with HMAC, Bearer, or API key authentication. An SQS dead-letter queue catches delivery failures for retry.
Every side effect is asynchronous and decoupled from the call path. The caller never waits for analytics or webhooks.
Config Validation at Synth Time
Following the same pattern as the portfolio infrastructure, all CDK context values pass through Zod schemas before any resources are created:
const stageConfigSchema = z.object({
accountId: z.string().regex(/^\d{12}$/),
region: z.string().min(1),
domainName: z.string().min(1),
rootZoneId: z.string().min(1),
delegationRoleArn: z.string().min(1),
connectInstanceAlias: z.string().min(1),
});If someone misconfigures a stage, cdk synth fails immediately with a clear Zod error
instead of deploying a half-broken stack.
What's Next
In Part 2, I'll walk through the AI conversation engine — how the Lambda manages multi-turn history, handles tool use, and stays within guardrails.