Official Sync:2026-04-07

QA & Tester Compliance Pathway

How to plan, run, and document accessibility testing in a way that survives a regulator audit. Automated tools are part of it. They are not most of it.

Last reviewed: 2026-04-07

Building a test plan against EN 301 549 Annex C

If your test plan doesn't reference the standard, your evidence is decoration. Annex C is what the regulator will expect to see.

What the law says

Annex C of EN 301 549 is the normative test spec that goes with the standard. For each functional requirement in the main body, it defines what counts as a passing test. When a market surveillance authority asks how you verified clause 9.1.4.3 contrast, the answer they want to hear is 'we followed the Annex C test step', not 'we ran axe-core'. Web content sits under Chapter 9 (mapping to WCAG 2.1 AA, with WCAG 2.2 close behind). Non-web software sits under Chapter 11. Documentation sits under Chapter 12. Annex C covers them all. EAA Article 13 says operators have to document the conformity assessment, and the documentation has to show what got tested and how.

What it means in practice

Build your test plan as a spreadsheet, one row per Annex C test step. Columns: clause, test description, whether it applies to your product, test method (manual, automated, both), tester name, date, result, and where to find the evidence (screenshot, log, recording). Not every clause applies to every product. A web app probably doesn't need any of the Chapter 8 hardware tests, or the Chapter 10 non-web document ones. Mark them N/A with a one-line reason. The regulator wants to see you actually considered each clause, not that you quietly skipped half the standard. For each clause that does apply, decide whether the test is fully manual, fully automated, or both. Roughly 30% of WCAG can be checked by a script. Contrast, missing alt text, missing labels — axe-core catches those. Whether the alt text actually means anything, or whether the focus order makes sense — no automated tool will tell you. Be honest about the split, and never mark a clause as 'tested' when the only evidence behind it is an axe-core report. Keep the spreadsheet in version control alongside your code. When something in the codebase changes, you re-run the test plan against the affected pages, not the whole site. That's what Annex C tests are designed for — they're reproducible on purpose.

Common pitfalls

  • Treating an axe-core report as a complete audit. It's a starting line, not a finish line.
  • Skipping clauses with no explanation. A regulator looking at your conformity assessment wants to see deliberate decisions, not blank spaces.
  • Running the test plan once at the end of a release cycle and never again. Accessibility regresses fast, and nobody hears it happen.

How to verify it

The test plan is verified by actually running it. Every applicable clause needs a date, a tester, and a result. The percentage of clauses passing is your headline conformance number. Anything that fails or partially fails goes into the remediation backlog. The Self-Assessment Pipeline on this site walks through this in a guided wizard and exports the result as a Word document. If you don't have a custom test management tool, it's the fastest way to produce documentation a regulator will actually accept. The Compliance Report Builder packages the final deliverable into a format you can hand straight to a market surveillance authority or staple to an Article 13 statement.

AccessibilityRef tools that help

Automated testing — what it catches and what it misses

Automated testing is fast, cheap and necessary. It's also incomplete in ways that genuinely matter.

What the law says

EN 301 549 doesn't actually require automated testing. It requires that you can demonstrate conformance with the relevant clauses. Automated tools are good evidence for some clauses and useless for others. EAA Article 13's conformity assessment is method-agnostic — you have to show the result, not justify the technique. In practice, no test plan should lean on automation for clauses that automation can't verify. Mixing the two well is the difference between a defensible audit and a paper exercise nobody believes.

What it means in practice

Run an automated tool against every page of your application. Wire it into CI. Run it on every pull request. Fail the build on new violations. axe-core, Pa11y and Lighthouse all do this job competently — pick one and integrate it. Automation does well at the obvious stuff. Missing alt text. Missing form labels. Contrast failures (with caveats around overlapping elements). Missing language attributes. Duplicate IDs. ARIA misuse. Missing landmarks. That's the easy 30% of WCAG. Automation misses everything that requires actual judgement. Whether the alt text means anything. Whether the heading structure makes sense as a page outline. Whether the focus order is sensible. Whether keyboard interactions work in your custom widgets. Whether the page reads coherently to a screen reader. Whether your error messages are actually helpful. Whether the product is usable for someone with cognitive disabilities. None of that has a script. Think of the split this way: automation catches your regressions, manual testing catches your new failures. Once the automated suite is clean, that's when manual testing earns its keep. Skip the manual layer and you'll ship something that passes axe-core and fails real users.

Common pitfalls

  • CI failing on axe-core violations and the team assuming that means the site is accessible. It means the easy bugs are caught. The hard ones still go out the door.
  • Running an automated scan on a page once, fixing what came up, and never running it again. Accessibility regresses on every deploy.
  • Trusting automated checks for alt text quality. The tool can confirm alt text exists. It cannot tell you whether the alt text is actually any good.

How to verify it

Baseline first: run axe-core against every page in your test plan and write down the violations. Fix all of them. Then set CI to fail on any new violation introduced by a pull request. That gives you regression coverage for the bits of WCAG a script can actually verify. Then run the manual tests from your Annex C plan on top. Those two things together give you the full conformance picture. Track both — the automated runs and the manual results — in your test management tool, or in the Self-Assessment Pipeline. For a quick triage of a site you don't know yet, the Headings Pro and Alt Text Pro tools here let you scan multiple URLs at once. Useful for benchmarking competitors, auditing an acquired company's product, or hunting down the worst pages on something you've just inherited.

AccessibilityRef tools that help

Screen reader testing

Screen reader testing is the only way to actually know whether your product is usable for blind and low-vision users. It's also the test most teams quietly skip.

What the law says

WCAG 4.1.2 (Name, Role, Value) says the name, role and state of every UI component has to be exposed to assistive technology. The only way to verify that is to use assistive technology. EN 301 549 Chapter 9 brings the same requirement forward, and no automated tool can tell you whether the user experience matches the spec — only a real screen reader can do that. Annex C of EN 301 549 spells out screen reader test methods explicitly. NVDA, JAWS and VoiceOver are named as common reference implementations.

What it means in practice

Pick two screen readers and actually learn them. The usual pairs are NVDA + Firefox or NVDA + Chrome on Windows (NVDA is free, no excuse), and VoiceOver + Safari on macOS (built in, no install). For mobile, VoiceOver + Safari on iOS, TalkBack + Chrome on Android. JAWS is dominant in enterprise environments but expensive — test with it if your customers use it, otherwise NVDA is the better place to start. Learn the basic commands. Start and stop the screen reader. Navigate by heading. By landmark. By form field. Read the next item. Read continuously. You should be able to operate the screen reader without your hands leaving the keyboard. Twenty minutes of practice and you'll be functional. For each test: turn the screen reader on, actually close your eyes (looking at the screen defeats the whole exercise), and try to get through a core user flow on keyboard alone, listening to whatever the screen reader tells you. If you can't, write down what failed. Was the element unreachable? Unlabelled? Mislabelled? Out of order? Just confusing? Test the same flow on at least two different screen reader / browser combinations. Some bugs are specific to one pairing — JAWS handles some ARIA constructs differently from NVDA, VoiceOver has its own quirks. A bug that only shows up in one combination is still a bug. It's just a different priority than one that breaks every reader.

Common pitfalls

  • Watching the screen while running the screen reader. Your brain fills in the gaps and you miss the bugs.
  • Testing with NVDA + Chrome only and assuming every other screen reader behaves the same way. They don't.
  • Treating a passing test on the developer's local machine as proof for production. Real users have different versions, different verbosity settings, different muscle memory.

How to verify it

Run the core flows: sign up, log in, search, the main thing your product does, checkout. For each, write down where you got stuck. The output of a screen reader test is a numbered list of bugs with reproduction steps and recordings. The recordings really matter — the developer fixing the bug has probably never used a screen reader before and needs to hear exactly what the user heard. The Screen Reader Simulator on this site isn't a replacement for testing with a real screen reader, but it makes a good first pass. Paste in your HTML and it'll give you a text rendering of roughly what a screen reader would say. Catches the obvious stuff before you bother booting up NVDA.

AccessibilityRef tools that help

Keyboard testing methodology

Keyboard testing is the cheapest, fastest, highest-yield manual accessibility test going. Run it on every page of every release.

What the law says

Four WCAG criteria do the work. 2.1.1 (Keyboard) says all functionality has to be operable from a keyboard. 2.1.2 (No Keyboard Trap) says focus has to be able to move away from any element it lands on. 2.4.3 (Focus Order) says the order has to make sense. 2.4.7 (Focus Visible) says you need a visible indicator. EN 301 549 brings all four forward. The Annex C test methods spell it out: tab through the page, watch where focus lands, try to operate every control, try to leave every control. It's a manual test by design.

What it means in practice

Unplug the mouse. Tape over the trackpad. Make it physically impossible to cheat. Now try to use your product. The test is straightforward. Start at the top of the page. Press Tab. Watch where focus lands. Write down anything that should be focusable but isn't reached. Anything that gets focus but has no visible indicator. Any time the focus order stops matching the visual reading order. Any time focus disappears entirely (that's a keyboard trap). At every focusable element, try to operate it. Enter, Space, arrow keys, Escape. Buttons should fire on both Enter and Space. Links should fire on Enter only. Custom widgets should follow whatever the WAI-ARIA Authoring Practices Guide says for that pattern. Anything that responds to mouse but not keyboard is a 2.1.1 fail. At every modal, drawer or popup: focus has to move into the new layer when it opens, has to stay trapped inside while it's open, and has to land back on the trigger when it closes. If any of those isn't happening, the modal is broken. Once you're practised, the whole test takes ten to fifteen minutes per page. It catches more accessibility bugs per minute than anything else you can do manually.

Common pitfalls

  • Testing the page once and assuming everything's fine. Keyboard regressions ship in nearly every release.
  • Testing in development only. A page that's keyboard-accessible locally can break in production thanks to CSP, third-party scripts, or A/B test variants.
  • Skipping modal and dropdown tests because 'they work for mouse users'. Modals and dropdowns are exactly where keyboard bugs hide.

How to verify it

The output of a keyboard test is a numbered list of failures with screen recordings. For each one, document the page, the element, what should have happened, what actually happened, and the steps to reproduce it. The Focus Order Visualiser on this site overlays the actual tab order on a screenshot of any page. Two things it's good for: verifying fixes after remediation, and making focus order legible to developers who never keyboard-test their own work.

AccessibilityRef tools that help

Documenting findings and severity

A bug report that says 'screen reader broken' helps nobody. Clause references, reproduction steps, and honest severity are what actually get bugs fixed.

What the law says

EAA Article 13 says conformity assessments have to be documented. Annex C of EN 301 549 says the tests have to be reproducible. Both mean your test outputs need enough structure that someone else — an auditor, a regulator, a new tester on the team — can verify what you found.

What it means in practice

Every bug report needs the same things: a one-line summary, the clause it fails (something like 'WCAG 2.4.7 / EN 301 549 9.2.4.7'), the page URL or screen, the assistive technology used (browser plus screen reader version, or 'keyboard only'), repro steps, expected behaviour, actual behaviour, a severity, and a recording or screenshot. Severity has to be calibrated against user impact, not against how annoying the fix is for the developer. A four-tier scale works fine: • Critical: blocks a user from completing a core flow (sign-up, checkout, login). Ship-stopper. • High: significantly degrades the experience, but a determined user can work around it. • Medium: cosmetic accessibility issue, or a minor barrier on a non-core flow. • Low: nit, polish, future-work suggestion. The whole point of calibration is that the developer and the PM are going to sort by severity to decide what to fix this sprint. Flag everything 'High' and you've made the prioritisation meaningless. Be honest. A missing alt on a decorative icon is Low. A keyboard trap in checkout is Critical. Link every bug to its clause. That's what makes the conformity assessment reusable: when the bug is fixed, the clause flips from fail to pass and there's a paper trail. When a regulator asks for your evidence on clause 2.4.3, you can pull the test result, the bug, the fix commit, and the re-test in one chain.

Common pitfalls

  • 'Severity: high' on every single bug. The label stops meaning anything and developers stop sorting by it.
  • Bug reports with no recording attached. A keyboard trap is so much easier to fix when the developer can watch it happen.
  • Bug reports that don't reference any clause. Without one, the bug looks like a UX preference instead of a compliance issue, and it gets prioritised accordingly.

How to verify it

Decent test for your bug reports: hand one to a developer who's never used a screen reader. Can they reproduce the bug from the steps alone? If not, your report needs more detail. If yes, your documentation is doing its job. The Compliance Report Builder pulls all your test results, severity ratings, and clause references into one audit-grade document. Hand it to leadership, hand it to a customer, hand it to a market surveillance authority — same document works for all of them.

AccessibilityRef tools that help

Important Legal Disclaimer

This tool is a self-assessment aid only and does not constitute legal advice or a formally certified compliance assessment. Outputs — including reports, scores, checklists, and accessibility statements — are for internal use and should be reviewed by a qualified legal representative or independent accessibility auditor before being relied upon for regulatory, procurement, or public-disclosure purposes. All assessment risk lies with the internal assessor. accessibilityref, its developers, and staff accept zero liability for losses arising from use of or reliance on these outputs. Always verify against official sources: the W3C WCAG 2.2 Recommendation, the European Accessibility Act (Directive 2019/882), and your national enforcement authority.