Scoring Rubric
A rubric is the set of criteria a call is scored against. Omniflow ships a built-in rubric that covers most use cases; you can also build custom rubrics per scenario or per team.
The built-in rubric
| Criterion | What it measures | 1-5 scale |
|---|---|---|
| Empathy | Did the agent acknowledge feelings before solving? | 1 = ignored, 5 = textbook acknowledgment |
| Resolution | Did the customer’s problem get solved? | 1 = not solved, 5 = solved cleanly |
| Compliance | Were required scripts, disclosures, or policies followed? | 1 = missed, 5 = nailed |
| Efficiency | Time-to-resolution given the scenario’s complexity. | 1 = wandered, 5 = direct without rushing |
Each criterion is scored 1-5 by the AI grader; the overall score is a weighted average.
The built-in rubric is also what runs against production calls in QA Analytics. The same scores you tune in training are the ones you’ll see on real customer conversations.
Custom rubrics
Build a custom rubric when the built-in one doesn’t fit — e.g. a financial-services team needs disclosure_quoted_verbatim, a sales team needs next_step_committed.
Custom criteria fields
| Field | What it does |
|---|---|
| Name | Short label shown on dashboards. |
| Description | What “good” looks like — the AI grader uses this. |
| Weight | 0-1, must sum to 1 across the rubric. |
| Mandatory minimum | Optional. If the score is below this, the attempt fails regardless of overall average. |
| Anchors | 1-5 narrative anchors so graders calibrate. |
Example: sales discovery rubric
| Criterion | Weight | Mandatory min |
|---|---|---|
| Asked a discovery question in the first 60s | 0.20 | 3 |
| Mirrored the customer’s stated goal | 0.20 | — |
| Quantified the customer’s problem | 0.15 | — |
| Proposed a relevant solution | 0.20 | 3 |
| Committed a next step | 0.25 | 4 |
Committed a next step has a mandatory minimum of 4 — if the call ends without a clear next step, the attempt fails regardless of how well everything else went.
Calibration
Run the same call past the AI grader, a senior coach, and a junior coach. Compare scores. Where they diverge, refine the criterion description until the AI’s interpretation matches the senior coach’s.
Don’t lock a rubric until it’s calibrated. A miscalibrated rubric grades the wrong things, and trainees lose trust in the score quickly.
Weighting tradeoffs
- Lots of low-weight criteria → noisy scores, hard to act on.
- A few high-weight criteria → clearer signal, but easier to game.
- Mandatory minimums → catch catastrophic failures even when the average looks fine.
Most production rubrics end up with 4-6 criteria and 1-2 mandatory minimums.
Open in Omniflow
Related
| If you want to… | Go to |
|---|---|
| Apply the rubric to production calls | Scorecards |
| Define custom KPIs in QA | Custom KPIs |
| Coach a low-scoring attempt | Reviews & Coaching |