You're Scoring Candidates With AI. Do You Know What It's Actually Measuring?

Gershon Goren

- Last Updated: May 21, 2026

Gershon Goren

- Last Updated: May 21, 2026

There's a question worth asking about any AI system before asking about its accuracy: what is it actually measuring?

In AI video interview scoring, that question has a more uncomfortable answer than most buyers expect. The category has grown quickly, and the promise is compelling. Instead of hiring panels comparing subjective impressions across hours of recorded responses, AI scores every candidate automatically against a consistent standard. In theory, that's a more rigorous and more equitable process.

In practice, many of these systems evaluate tone of voice, facial expressions, and pace alongside or instead of what candidates actually say. The model produces a score, the hiring team accepts it, and nobody can explain what it was based on or whether it has anything to do with who will actually succeed in the role.

The disparity between what AI hiring tools say they’re measuring and what they actually measure is a critical blind spot in HR right now. And one that's worth examining before it becomes a larger problem.

The Evidence

Several major platforms quietly removed facial analysis features after regulatory pressure and public scrutiny, an acknowledgment that criteria advertised as objective were neither reliable nor fair. But removing the most visible problem didn't fix the underlying one. Any system that evaluates how a candidate presents rather than what they communicate is still measuring the wrong thing and doing so in ways that are difficult to detect and harder to defend.

The assumption behind these signals, that tone, pace, and eye contact correlate with competence or fit, is not well supported by evidence. The evidence that measuring them introduces systematic bias is considerably stronger. And as regulatory frameworks catch up to the practice, that distinction is moving from an ethical concern to a legal one.

The Accountability Gap

Consider what happens when a hiring decision gets challenged. It could be a candidate questioning why they weren't selected, an internal audit, or a regulatory inquiry. The question is how the decision was made.

"The AI scored them lower" is not a satisfying answer to any of those. It can't be traced back to specific, job-relevant criteria. It can't be explained to the candidate. It won't hold up to compliance scrutiny. When you can't say "this candidate scored lower because their response to the structured communication question demonstrated X rather than Y," you don't have a data-driven process. You have a process that produces a number at the end, which is different.

The irony is that organizations adopting opaque scoring tools often do so to reduce bias and improve consistency. Those are the right goals. A system that can't be interrogated, explained, or audited doesn't achieve them. It makes the problem harder to see.

This matters beyond HR. As AI takes on more decision-making roles across enterprise functions, the question of what any given system is actually optimizing for becomes critical. Video interview scoring is a useful test case precisely because the stakes are concrete and the failure modes are instructive.

Structured Scoring

There is a meaningful difference between AI that scores video interviews and AI that scores them correctly. The distinction is architectural.

A structured approach starts with job requirements. What competencies does this role demand? What does a strong response to each assessment question actually look like? From those answers, you build explicit rubrics. Criteria that describe what good, adequate, and weak responses look like for each competency being evaluated. Those rubrics are defined and approved by the hiring team before any candidate records a response.

When responses come in, the system evaluates what candidates actually said against those pre-defined criteria. Not how they sounded. Not how they looked. What they communicated. Criterion-level scores roll up to a ranked view that hiring teams can trust because they can trace exactly what produced it.

This architecture has a secondary benefit that matters for teams managing high-volume pipelines: candidate response summaries. Rather than requiring reviewers to watch every recording, a well-generated summary captures the substance of what a candidate communicated. A factual record that can be evaluated consistently against the same criteria across every candidate. That's where real efficiency gains live, and they don't come at the expense of fairness or auditability.

None of this works unless the scoring criteria can be reviewed, edited, and approved before deployment. The AI should generate a starting point, identify relevant competencies, and draft rubric criteria from the job description, but humans need to own the standard. If a reviewer can't look at a rubric and say, "yes, this is what we are evaluating and why," it should not be deployed.

Questions Worth Asking

For leaders evaluating these tools, whether for their own organization's hiring or as part of a vendor assessment, four questions cut through the noise quickly.

What specifically is being scored?

Ask for an explicit list of evaluation criteria. If the answer includes anything other than the content of candidate responses, ask for the validation data behind those criteria.

Is the scoring tied to job requirements?

Generic rubrics applied uniformly across roles are not a structured evaluation. Legitimate structured scoring starts from the specific competencies required for the specific role.

Can the criteria be reviewed and modified before scoring begins?

If the rubrics are fixed and opaque, the hiring team is not in control of its own evaluation standard.

Can any score be explained to a candidate or a regulator?

This is the accountability test. If the answer requires "the AI said so" rather than pointing to documented criteria and how a candidate performed against them, the process is not defensible.

Good systems answer these questions clearly. The ones that don't are telling you something important about the choices they made.

Why This Matters Now

The EU AI Act's requirements for high-risk AI systems take effect this August. Employment AI, including video interview scoring, falls into this category. Explainability, human oversight, and auditability are moving from best practices to legal requirements.

But the argument isn't just regulatory. It's practical. When hiring teams can see exactly how a score was produced, they use it. When they can't explain it, they don't. And the efficiency gains the tool promised evaporate. The systems that will last in your hiring tech stack are the ones that can stand behind their decisions.

Getting that right doesn't require avoiding AI in interviews. It requires being precise about what you're asking it to measure, and honest about why.