Skip to content
dawalnut
Menu

A Tour of the Telephony AI Platform: From Agent Creation to Live Calls

awsaitypescriptdeveloper-experience

After building the architecture, the conversation engine, and the dashboard, it's time to actually use the thing. This post is a practical tour of the deployed platform — what you can do, how it works end-to-end, and what makes it interesting for developers building voice AI products.

Creating Your First Agent

Everything starts with an agent. An agent defines how the AI behaves on a call: what it says, how it sounds, and what tools it can use.

The agent form has a few sections worth highlighting:

System Prompt

This is the core personality. You write natural language instructions and the model (Claude Sonnet 4 by default) follows them on every turn. A simple example:

You are a friendly customer support agent for a coffee subscription service.
Help customers check their order status, update their delivery preferences,
or cancel their subscription. Always confirm actions before executing them.

Voice Configuration

The platform uses Amazon Polly for speech synthesis. You pick from 16+ voices across three engines:

EngineTradeoff
GenerativeMost natural, slight latency
NeuralGood balance of quality and speed
StandardFastest, less natural

Each voice supports specific languages. The form shows 12 language options from en-US to zh-CN, and the voice preview component tells you which engine/language combinations are available for your selected voice.

Tool Definitions

This is where it gets interesting for developers. Tools are defined as JSON and passed to Bedrock's Converse API:

[
  {
    "name": "lookup_order",
    "description": "Look up a customer order by order ID or email address",
    "inputSchema": {
      "type": "object",
      "properties": {
        "orderId": { "type": "string" },
        "email": { "type": "string" }
      },
      "required": ["orderId"]
    }
  }
]

When the AI decides it needs to call a tool during conversation, it returns structured parameters. The Lambda handler executes the tool and feeds results back into the next turn — all within the 8-second Connect timeout.

Guardrails

Three layers of safety:

  • Turn limit — conversations auto-complete after N turns (default 50). Prevents runaway calls.
  • Bedrock Guardrails — optional content filtering via a Guardrail ID. Blocks harmful content before it reaches the caller.
  • Disclaimers — custom strings appended to the system prompt. Useful for compliance ("This call may be recorded").

Testing Before Going Live

Before assigning an agent to a phone number, you can test it through the dashboard's chat interface. Hit "Test" on any agent and you get a multi-turn conversation window.

This hits the same Bedrock model with the same system prompt, tools, and guardrails — the only difference is text instead of voice. You see the full response including any tool invocations, turn count, and whether the conversation is marked complete.

It's surprisingly useful for iterating on system prompts. You can test edge cases (caller is angry, asks something off-topic, tries to break the guardrails) without burning phone minutes.

Claiming Phone Numbers

Once your agent is ready, assign it a phone number. The platform searches AWS Connect's available inventory by country code. Pick a number, assign an agent, and the system:

  1. Claims the number through Connect
  2. Creates a contact flow pointing to your AI conversation Lambda
  3. Stores the phone-to-agent mapping in DynamoDB
  4. Associates the Lambda and Lex bot with the Connect instance

From that point, any call to that number goes through the AI conversation loop.

The Call Flow

When someone calls, the sequence is:

Caller → Connect → Contact Flow → Greeting Lambda → Polly speaks
                                 → Lex captures speech
                                 → AI Turn Lambda → Bedrock (Claude)
                                 → Polly speaks response
                                 → Loop until complete
                                 → Disconnect + record outcome

The contact flow is generated programmatically — no clicking through the Connect visual editor. Each action gets a deterministic UUID so the flow is reproducible across deployments.

A key design choice: conversation history lives in DynamoDB, not in the contact flow. Each Lambda invocation loads the full message history, appends the new utterance, calls Bedrock, and saves the updated state. This means the AI has full context of everything said so far, even across multiple turns.

Outbound Calls

The platform isn't limited to inbound. You can initiate outbound calls through the API:

curl -X POST https://api.telephony-dev.dawalnut.com/calls/outbound \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "targetNumber": "+14155550100",
    "agentId": "01HXYZ...",
    "sourceNumber": "+14155550000",
    "context": { "campaignId": "spring-2026" }
  }'

The context object gets passed into the conversation as contact attributes. Your system prompt can reference it — for example, "You are calling about the spring 2026 promotion."

Call History and Transcripts

Every call is tracked. The calls page shows a filterable table:

  • Date range — find calls from a specific period
  • Agent — filter by which AI handled the call
  • Outcome — appointment booked, escalated, resolved, abandoned, etc.
  • Sentiment — positive, negative, neutral, mixed (from Contact Lens)
  • Caller number — search for a specific caller

Click any call to see the full detail: metadata, real-time transcript with per-segment sentiment, audio recording (via presigned S3 URL), and the AI's outcome assessment including any tool results.

The transcript comes from Contact Lens, not from our own system. It runs independently and provides sentiment analysis per turn — useful for spotting where conversations go wrong.

Analytics

The analytics dashboard aggregates call data across configurable time windows (7, 30, 90 days or custom). Four summary cards at the top:

  • Total calls in the period
  • Average duration across all calls
  • Most common outcome (what's happening most?)
  • Active agents (how many agents fielded calls?)

Below that, four charts:

  • Outcome distribution — donut chart showing resolved vs. escalated vs. abandoned, etc.
  • Sentiment distribution — how callers feel about the interaction
  • Calls by day — volume trends over time
  • Agent usage — which agents are handling the most calls

Everything is filterable by agent, so you can compare performance across different configurations.

Webhooks for Integration

The platform publishes two event types:

  • telephony.agent.turn — fired after each conversation turn (useful for real-time monitoring)
  • telephony.agent.outcome — fired when a call completes (useful for CRM updates, ticket creation)

Webhook configuration supports three auth schemes:

Auth TypeHow It Works
HMAC-SHA256Signs the payload with a shared secret. You verify the signature server-side.
Bearer TokenSends a token in the Authorization header. Simple but effective.
Custom API KeySends a key in a custom header. Flexible for existing API gateways.

Each webhook can filter by event type, so you can have one endpoint for real-time turns and another for completed outcomes.

Failed deliveries go to an SQS dead letter queue with 3 retries via EventBridge. You won't lose events.

API Keys for Programmatic Access

Beyond the dashboard, everything is accessible via API. The settings page lets you generate API keys for programmatic access — the key is shown once, stored as a SHA-256 hash, and can be revoked at any time. Only the last 4 characters are displayed after creation.

What Makes This Stack Interesting

A few architectural choices worth noting:

Contact flows as code. The entire Connect IVR flow is generated by a TypeScript function. No manual configuration, no drift between environments, fully reproducible.

Single-table DynamoDB. Agents, phone mappings, conversations, outcomes, settings, and API keys all live in one table with composite keys and a GSI. Tenant isolation is enforced at the query level through JWT claims.

Event-driven analytics. No polling, no cron jobs. Connect publishes disconnect events, Contact Lens publishes analysis events, and the AI Lambda publishes outcome events — all through EventBridge. The analytics Lambda only runs when something happens.

Multi-turn AI within IVR constraints. Amazon Connect gives you 8 seconds per Lambda invocation. The conversation engine loads history from DynamoDB, calls Bedrock, saves state, and returns — all within that window. The Lex passthrough bot handles speech-to-text without adding custom intents.


The full source is in the telephony project — four CDK stacks, four Lambda handlers, and a TanStack Start dashboard. If you're building voice AI on AWS, I hope this tour gives you a sense of what's possible with Connect + Bedrock + a bit of infrastructure-as-code.

Related Projects