scorecardscall qualitycoachingplaybook

Building a Call Scorecard Your Team Will Actually Use

APX Intelligence·March 19, 2026·4 min read

Almost every operations team has a call scorecard sitting in a Google Sheet somewhere. Almost none of them use it.

The reason is always the same. The scorecard was built by a manager who wanted to capture everything: 40 criteria, four-point scales, weighting columns, a comments field. It takes 25 minutes to score one call. Nobody has time for that, so it gets used once a quarter for the QA review and ignored the rest of the time.

A good scorecard is the opposite. Tight enough to score a call in the time it takes to read the transcript. Sharp enough that two managers grading the same call land within a few points of each other. Specific enough that AI can grade it the same way a human would.

This is how to build one.

Start With the Outcome

Before you write a single criterion, answer one question: what is this call supposed to accomplish?

A discovery call's outcome is information collected and a next step booked. A renewal call's outcome is the renewal landing. An inbound CX call's outcome is the customer leaving the call calmer than they arrived. A compliance call's outcome is every required disclosure delivered, with verification.

If you can't articulate the outcome in one sentence, the scorecard will be incoherent. Start there.

Five Criteria, Not Forty

The temptation is to capture every dimension of a great call. Resist it. The best scorecards have five criteria, weighted. That's it.

For a sales discovery call:

Discovery quality (30%): did the rep uncover the customer's actual problem, budget, and timeline?
Active listening (15%): did the rep follow up on what the customer said, or just run their script?
Objection handling (20%): when pushback came, did the rep address it or deflect it?
Next-step setting (25%): did the call end with a specific, time-bound, mutually-agreed next step?
Call hygiene (10%): pace, professionalism, recording disclosure, time management.

Five criteria, weighted, totaling 100. A rep can read this and know what's being scored. An AI agent can grade against it consistently. A manager can coach off it without reading 40 rows.

Make Criteria Observable

The single biggest mistake in scorecards is criteria that aren't observable from the call itself.

Bad: "Built rapport with the customer." (Subjective, ungradable.)

Good: "Asked at least one open-ended discovery question in the first 90 seconds." (Observable, scoreable, AI-checkable.)

Bad: "Showed empathy."

Good: "When the customer expressed a concern, acknowledged it before responding (e.g., 'I hear you on that' or 'that makes sense')."

The trick is to write criteria that someone (human or AI) could mark yes / no / partial just from reading the transcript. If the criterion requires you to "feel" the call, it's not a criterion. It's vibes.

Build for the Industry, Not the Generic Rubric

A compliance call at an insurance brokerage has criteria a sales call at a SaaS company will never have: "Stated the company name and recording disclosure within first 15 seconds." "Confirmed the customer's date of birth before discussing policy details." "Read the renewal terms verbatim."

A great scorecard is opinionated about your industry. Generic call rubrics are useless because they're trying to grade every call ever made. Your scorecard exists to grade your calls, in your industry, against your standards. Lean into the specificity.

Set the Floor, Not the Target

When you wire the scorecard up to alerting, the question isn't "what's a great call?" It's "what's a call I need to know about right now?"

That's the floor. For most teams, the floor is somewhere between 60 and 70 out of 100. Above the floor, calls go in the dashboard. Below the floor, the right manager gets pinged.

The floor is what makes the scorecard operationally useful. Without it, you have a scoring system. With it, you have an early warning system.

Iterate

The first version of your scorecard will be wrong. That's fine. It's why you build it tight, not comprehensive. After two weeks of grading, you'll see which criteria are firing the wrong way, which weights are off, and which criteria are duplicates of each other. Update.

A scorecard that's frozen in stone is a scorecard that's drifting from reality. The good ones get updated quarterly, lightly. The great ones get updated whenever the script changes.

The 30-Minute Test

Here's the test for whether your scorecard is in good shape: can a new manager learn it in 30 minutes and grade a call accurately on their first try?

If yes, you have a scorecard that will scale. If no, you have a Google Sheet nobody uses.

Build for the 30-minute test. Everything else follows.

Build agents that listen for you

APX Intelligence runs real-time call analysis on every conversation. Sales coaching, compliance, customer service, all in one platform.