Methodology · open document

How we calculate your AI exposure score.

This is the full method. The data sources, the formula, the worked example, and the honest list of what the analysis can't see. We publish it because trust requires it.

VERSIONv2.4 · revised May 2026
LAST REVIEWEDMay 1, 2026
NEXT REVIEWAugust 2026
OWNERSHIPResearch, Rolespan
SECTION 01

Why we publish this.

The AI Exposure Score is the most important number in your Rolespan report. It shapes the curriculum we recommend, the gaps we highlight, and the verdict we open with. If we publish a number that material to your career thinking, we owe you the math behind it.

Most “AI career assessment” products treat their scoring as proprietary magic. We think that's a tell. A score that can't be checked is a score that can't be trusted.

Our principle

If we wouldn't be comfortable defending the methodology to a skeptical reader of the Financial Times, we shouldn't be using it.

This page is written for that skeptical reader. It's longer than a typical product page should be. If you want the short version, the analysis itself contains a “How we got there” summary inside every report.

SECTION 02

What the score measures.

The AI Exposure Score is a directional estimateof how much of your role's task mix falls within current AI capability. It runs from 0 to 100, where:

  • 0 means none of your day-to-day tasks can plausibly be performed by current AI systems.
  • 100 means all of them can.
  • 25 to 55 is where most knowledge workers fall in 2026.

The score is a relative readon your task mix versus current capability — not a prediction of when, or whether, you'll lose your job. It's closer in spirit to a credit score than to a weather forecast: a useful proxy for risk that informs decisions, not a literal prediction of events.

What the score is

A weighted measure of how much of your work falls inside the current AI capability envelope, calibrated against task-level benchmarks and your CV's task frequency.

The score is most useful when you read it alongside the task breakdown — high exposure on tasks that produce 60% of your value carries different meaning than high exposure on tasks that produce 10%. The score collapses that into a single number for comparison; the rest of the report explains what's underneath.

SECTION 03

How we calculate it.

The calculation has four steps. The math is deliberately simple — we'd rather a method you can audit than a black box.

Step 1 — Extract task categories from your CV

We use a language model to read your CV and extract 12–18 task categoriesthat describe what you actually do. A task category is something like “drafting performance reports” or “negotiating with stakeholders” — not a job title, not a tool name. The same person's CV typically extracts to 14 ± 2 categories. We test this extraction continuously against a hand-labeled benchmark of 400 CVs across 30 role families.

Step 2 — Weight each task by frequency

Some tasks dominate your time; others are occasional. We weight each task by an estimate of its share of your working hours, based on signals in the CV (how often the task appears, whether it's described as a primary responsibility, role seniority, team size). Weights sum to 1.0 across all tasks. The weighting model is described in [4].

Step 3 — Score each task against AI capability

Each task category carries a capability score from 0.0 to 1.0, reflecting how well current AI systems perform that category of work. This is the highest-uncertainty number in the analysis. We derive it from three layered sources:

  • Published task-exposure research, primarily Eloundou et al. [1] and the OECD AI & Occupations framework [2]
  • Public benchmark performance (HELM, BIG-Bench, MMLU, SWE-bench) where it maps cleanly to a task category
  • Internal capability tests — a panel of working professionals reviews AI output on a standardized prompt set for each task category and rates it on a 5-point scale, refreshed quarterly

Step 4 — Aggregate and adjust

The raw score is the weighted sum of (frequency × capability) across all tasks, then multiplied by 100. We then apply two small adjustments:

  • Seniority adjustment (typically −3 to −8 points): senior roles tend to spend a higher share of their time on judgment, negotiation, and people work — categories whose capability scores we may underestimate at the task level
  • Role-baseline normalization (typically ±2 points): each role family has a small calibration constant from our hand-labeled benchmark, to correct for systematic over- or under-extraction of specific task types
Exposure Score · Formula
Sraw= Σ (fi·ci)·100// sum across task categories iSfinal=Sraw+Aseniority+Arole// apply adjustmentswhere  fi = frequency weight of task category i (Σ fi = 1)  ci = capability score of task category i (0.0 to 1.0)  Aseniority = role-seniority adjustment (−8 to 0)  Arole = role-family baseline adjustment (−2 to +2)
SECTION 04

Data sources.

The capability scores in step 3 are the single source of uncertainty most likely to date the analysis. We list the major data sources here and rate how heavily each one weighs into the current capability score table.

SourceUpdatedWeight
Eloundou et al. "GPTs are GPTs" task-exposure dataset [1]2023, supplemented 20250.30
OECD AI & Occupations framework [2]20240.20
Internal capability panel (rolling quarterly tests)Q2 20260.25
Public benchmarks (HELM, BIG-Bench, SWE-bench, MMLU) [3]Continuously0.15
Practitioner interviews (n=180 senior ICs & managers across 14 roles)Jan–Apr 20260.10

Weights are not arbitrary. Higher weights go to sources where the link between published evidence and a specific task category is most direct. The internal capability panel and practitioner interviews are weighted heavily despite being our own data because they are the only sources that age in months rather than years — the difference matters when frontier AI capability shifts faster than academic publishing cycles.

The table is republished here each quarter when we re-score capability. Past versions are kept in the changelog at the bottom of this page.

SECTION 05

A worked example.

Below is the full calculation for the sample report on Sarah Chen — a Senior Marketing Manager whose Exposure Score came out to 38 / 100. The numbers below are the actual values used to generate her report.

Sarah Chen · Senior Marketing Manager · B2B SaaS · 7 yrs

14 task categories extracted from CV · weighted by frequency · scored against current AI capability
Task categoryFreq (fᵢ)AI cap (cᵢ)fᵢ · cᵢ
High exposure · AI does this well
Drafting blog posts & campaign copy0.1200.850.102
Campaign & creative briefs0.0800.800.064
HubSpot performance reporting0.0600.750.045
Exec updates from dashboards0.0400.800.032
Competitor content research0.0400.700.028
Subtotal · high0.340.271
Medium exposure · AI assists, you drive
Audience segmentation0.0600.550.033
A/B test design0.0500.500.025
Marketing-mix planning0.0400.450.018
Briefing agencies & freelancers0.0500.400.020
Subtotal · medium0.200.096
Low exposure · your moat
Cross-functional stakeholder negotiation0.1500.100.015
Brand judgment calls0.0800.050.004
Hiring & developing team of 40.1000.050.005
Translating business goals to strategy0.0800.100.008
Crisis & reputation response0.0500.050.003
Subtotal · low0.460.035
Raw exposure score1.000.402 × 100 = 40.2

Raw score: 40.2. Seniority adjustment for “senior” level −1.6. Role-baseline normalization for marketing-manager family −0.6. Final score: 38.0.

38 / 100

A few things worth noting about this worked example:

  • The low-exposure tasks dominate Sarah's time (0.46 of total frequency) but contribute relatively little to the score because their capability values are low. This is the structural reason senior roles tend to score lower.
  • The high-exposure clustercontributes 67% of the raw score from only 34% of her time. This is the textbook “AI will eat the deliverables” pattern.
  • If Sarah's role shifted toward execution (more copy, less negotiation), her score could rise 12–18 points without anything changing about her capability — just her task mix.
  • The two adjustments together are small (−2.2 points). In most reports, adjustments fall in the −3 to +1 range. They are deliberately conservative.
SECTION 06

Choosing your top 5 skill gaps.

Skill gaps are not the same as exposure scores. The score asks “how much of your work is AI-exposed?”; skill gaps ask “which specific skills would most reduce your exposure and increase your leverage?”

For each user we score the 30 skills in the Rolespan curriculum against three criteria:

  1. Exposure reduction.How much would learning this skill lower the user's effective exposure on their highest-frequency high-exposure tasks?
  2. Compounding leverage. Does the skill enable other skills? Briefing AI well, for example, is a foundation skill — once learned, it makes prompt-library work, workflow building, and brand-voice prompting all faster to learn.
  3. Role relevance.A weight derived from the user's role family — marketing managers get more weight on Cluster 2 (output) and Cluster 5 (leadership), engineers get more weight on Cluster 3 (workflows).

The top 5 are then re-ordered by highest-marginal-impact-first, not by aggregate score. This matters: if the top scoring skill depends on a foundation skill the user lacks, we surface the foundation first. The result is a list that compounds when followed in order.

A deliberate choice

We surface five gaps, not ten or twenty. A short list a user might act on is more valuable than a long list a user will read once and forget.

SECTION 07

Matching the curriculum.

The 90-video library is structured as 30 skills × 3 levels (Foundations, Applied, Lead). For each user, we select between 9 and 14 videos based on three rules:

  • Every top-5 skill gap gets at least the Applied-level video. Foundations is added if the user shows no prior signal of fluency; Lead is added if the user is senior and the skill carries leadership weight.
  • Two foundation videos are always included from Cluster 1, regardless of role. Briefing AI well and recognizing how LLMs fail are pre-requisites for the rest of the library.
  • No more than 3 videos per phase from a single cluster. The curriculum is balanced across thinking partnership, output, workflows, data, and leadership. Heavy specialization is available through the full library; the recommended curriculum is intentionally broad.

The output is the user's personal curriculum — typically 10–13 videos across the three phases, sequenced for narrative arc (Orient → Apply → Lead) rather than pure topic clustering.

SECTION 08

What the score does not measure.

The Exposure Score is a measure of task-level capability overlap. It is not a measure of:

  • Job loss probability.Whether you actually lose your job depends on your company's adoption rate, your manager's preferences, labor market dynamics, and many other factors the score does not see.
  • Timing. A high score does not mean the change happens next quarter. Many high-capability tasks remain human-done for years due to trust, regulation, customer preference, or organizational inertia.
  • Salary impact. Some roles see compensation rise as AI augments their leverage. Others see it compress. The score does not predict which.
  • Your individual employability.A 60-point score for an engineer is very different at a top-quartile performer than at a bottom-quartile one. The score is computed from your CV's task mix, not your skill level within those tasks.
  • Industry-specific dynamics. Two marketing managers with identical CVs may face very different real-world AI adoption pressure if one works in regulated healthcare and the other works in growth-stage B2B SaaS.
  • The relative quality of human vs. AI output. The capability score reflects whether AI can do the task. Whether it does the task as well as youis a separate question we don't answer here.
SECTION 09

Where the methodology can be wrong.

We track six known edge cases where the score is less reliable than usual. If your situation matches one of these, treat the score as a wider range rather than a point estimate.

  • Sparse or vague CVs.CVs under 350 words or filled with generic phrases (“results-driven leader,” “passionate about innovation”) yield task-extraction error rates around 18%, versus 6% for detailed CVs. We flag this in the report when detected.
  • Role transitions. If your current role differs materially from your most recent CV entries — for example, you just moved from IC to manager — the score reflects the past, not the present. Re-upload with an updated CV.
  • Hybrid roles.CVs that span two role families (a product-marketing manager, a sales engineer, a designer-developer) suffer higher baseline error because our role-family adjustments assume a primary cluster. Hybrid roles typically score within ±5 of their “true” value.
  • Highly specialized work. Subfields where general AI capability does not predict capability on the specific work — for example, legal practice in a small jurisdiction, niche scientific research, regulated trading roles — are scored more cautiously and tend to score 4–8 points lower than the user might intuitively expect.
  • Recent capability shifts.If a major capability shift occurred within the past 60 days that affects your role's task mix, the score may lag. We refresh capability scores quarterly; mid-quarter, expect scores to under-react to brand-new capabilities.
  • Frontier roles. Roles that involve building AI systems — ML engineers, alignment researchers, AI product managers — are scored using the general framework, but their task mix is changing fast enough that scores age quickly. Re-analyze every 60 days.
SECTION 10

Update cadence.

The methodology has two layers that update on different schedules.

Quarterly: capability scores

Every quarter, our internal capability panel re-tests each of the 30 task-category-to-skill maps against a standardized prompt set on current frontier models. Capability scores shift up or down within a typical range of ±0.05 per quarter. Sharp moves (greater than 0.15) trigger a methodology version bump and a user-facing notice.

Annual: weights and adjustments

Frequency weights, seniority adjustments, and role-baseline normalizations are reviewed annually against a refreshed hand-labeled CV benchmark. The benchmark expands by 50–100 CVs per year. We publish the diff in this document's changelog.

v2.4 · May 2026
Capability refresh. Code-generation tasks +0.08; data-analysis tasks +0.05; image-generation in marketing +0.12. Capability source weights rebalanced to give Eloundou +0.05, internal panel −0.05.
v2.3 · Feb 2026
Quarterly capability refresh. Customer-success drafting tasks +0.06. Added new task category: "AI vendor evaluation" (capability 0.20).
v2.2 · Nov 2025
Annual review. Expanded hand-labeled benchmark to 400 CVs (+75). Seniority adjustment range widened from −6 to −8.
v2.1 · Aug 2025
Quarterly capability refresh. Engineering tasks +0.10 to reflect IDE-integrated coding tools reaching production maturity.
SECTION 11

References.

The following are the primary external references behind our capability scoring. Internal panel methodology and benchmark protocols are available on request.

  1. Eloundou, T., Manning, S., Mishkin, P., & Rock, D."GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models."OPENAI WORKING PAPER · 2023 · UPDATED 2025
  2. OECD."Artificial Intelligence and the Labour Market: Occupational Exposure Framework."OECD EMPLOYMENT OUTLOOK · 2024
  3. Stanford CRFM."Holistic Evaluation of Language Models (HELM)" & SWE-bench task suites.UPDATED CONTINUOUSLY · USED FOR CAPABILITY CALIBRATION
  4. Rolespan Research."CV-to-task-weight estimation: model card & benchmark protocol."INTERNAL · v2.4 · MAY 2026 · AVAILABLE ON REQUEST
  5. Acemoglu, D. & Restrepo, P."Automation and New Tasks: How Technology Displaces and Reinstates Labor."JOURNAL OF ECONOMIC PERSPECTIVES · 2019 · CONTEXTUAL
  6. U.S. Bureau of Labor Statistics.O*NET task descriptions, used as a structural reference for task-category granularity.CONTINUOUSLY UPDATED · ONETONLINE.ORG
  7. Rolespan Practitioner Panel."Quarterly AI capability ratings, Q2 2026."N = 180 · 14 ROLE FAMILIES · INTERNAL · METHOD AVAILABLE
SECTION 12

Contribute, or contest.

If you think we're wrong about something on this page, we want to know. Methodology improves when it gets challenged — and we'd rather hear “your capability score for X is too low” than have you silently distrust the analysis.

We particularly welcome:

  • Practitioners in any role who think our capability scores misrepresent their day-to-day reality
  • Researchers studying AI's labor-market impact who'd like to compare notes on methodology
  • Journalists or analysts writing about this space who want to understand the underlying math
  • Anyone who got a Rolespan report and felt the score was clearly wrong — with one or two sentences on why

Reach the research team directly.

Send your note through our contact form — the topic will land on the research queue and someone reads it within two business days.

Send research note

Now that you've read the math,
get your own score.

Free analysis. No credit card. The full methodology applied to your own CV.

Get my free AI risk report

UPLOAD CV OR PASTE LINKEDIN URL