This guide explains how the Assessor module automatically tests and evaluates your AI Employees. The Assessor generates realistic test conversations based on your agent's configuration, executes them across voice and chat channels, and produces performance scores to validate agent behavior before going live.
How the Assessor works
The Assessor uses a client-server architecture to run automated tests at scale:
Scenario generation: The Assessor reads your agent's Intent Type Map (ITM) and Agent Main Instruction (AMI) to generate realistic test conversations that cover each configured intent.
Test execution: The Assessor role-plays as a customer and conducts conversations with your AI Employee through phone calls or web chat sessions.
Performance scoring: After each conversation, the Assessor evaluates whether the agent completed the required steps and reached the expected Call-To-Action (CTA). Each intent receives a score from 0 to 100.
Reporting: Results are saved as assessment reports that detail which intents passed, which failed, and why.
Architecture
The Assessor consists of two components:
Assessor Client: Generates test scenarios from your agent's configuration, conducts conversations, and evaluates results.
Assessor Server: Manages phone number pooling, test queue orchestration, and resource allocation. The server coordinates testing resources so multiple assessments can run in parallel without bottlenecks.
This separation allows the system to scale efficiently β the server handles resource logistics while the client focuses on test quality.
Supported channels
The Assessor can test AI Employees across multiple channels:
Channel | Description |
Voice (phone) | Places phone calls to your AI Employee using pooled phone numbers managed by the Assessor Server. |
Web chat | Conducts text-based conversations through the AssessorChatFlow, testing chat agent responses and conversation logic. |
Test scenarios
The Assessor automatically generates test scenarios from two sources:
Intent Type Map (ITM): Defines the intents your agent handles (e.g., booking an appointment, requesting a quote). The Assessor creates test conversations that exercise each intent.
Agent Main Instruction (AMI): Provides additional context about your agent's behavior, tone, and procedures. The Assessor uses this to generate realistic customer personas and conversation topics.
Tests can be configured to cover both working hours and non-working hours scenarios by setting the run_both_wh attribute.
Assessment results
Each test run produces a report containing:
Intent scores: A 0β100 score for each tested intent based on step completion and CTA achievement.
Conversation transcripts: Full records of each test conversation.
QA analysis: Detailed breakdown of failed intents explaining what went wrong and which steps were missed.
Assessment reports are saved in the agent's Knowledge Base with the label report.
Run an assessment
Navigate to your AI Employee's configuration in the Builder.
Configure the intents you want to test in the Intent Type Map.
Trigger an assessment run.
Monitor the progress as the Assessor executes conversations across your configured channels.
Review the assessment report to identify any intents that need improvement.
ποΈ NOTE
The Assessor can be configured to run automatically when an agent's configuration is updated by setting run_on_update to true.
