How Legible works

Every score is produced by a deterministic, open pipeline. This page explains each stage and every check — including what we can’t claim.

What we can’t claim

Legible does not have access to any real ATS system. Scores are based on documented parser behaviours, not live ATS APIs.
A high Legible score does notguarantee your resume will pass a specific employer’s ATS configuration.
Legible analyses machine-readability, not content quality, keyword relevance, or cultural fit.
DOCX files skip layout analysis (DOCX is a flow format with no page coordinates). Scores for DOCX are based on section and field checks only.

The 5-stage pipeline

1
Dual extraction
L3
Your file is run through two independent engines. For PDFs: PyMuPDF (fitz) for layout-aware block extraction, and pdfminer.six for reading-order plain text. For DOCX: python-docx paragraph + table traversal. Both outputs are kept so downstream steps can cross-check them.
2
Layout analysis
L4 – L7
The layout map from PyMuPDF is analysed for multi-column regions (L4), inline image objects (L5), table structures (L6), and header/footer bands (L7). Each detection produces a structured annotation used by the scorer.
3
Section & field extraction
L8 – L9
Paragraph text is segmented into canonical sections by matching heading patterns against a curated regex library (L8). Named entities — email, phone, URLs, degree, school — are then extracted from each section using rule-based NER (L9).
4
ATS scoring (strict + lenient)
L10 – L11
Two scorers run independently. Strict (L10) simulates an older parser: penalises multi-column layouts, images, small fonts, missing sections. Lenient (L11) simulates a modern NLP-based ATS that can tolerate some layout noise. The score gap reveals parser-dependent risk.
5
AI recommendations
L12
A ranking pass sorts all score deductions by points_lost. The top five are expanded into concrete fix instructions by a small LLM prompt (or a rule-based fallback when no API key is configured). Each recommendation includes an estimated points_gain if the fix is applied.

Individual checks

Each check below is directly linked from the diagnostic report. Anchor URLs follow the pattern /legible/methodology#check-{id}.

Contact information

Field extraction (L9)

Legible looks for a valid email address, phone number, and at least one professional URL (LinkedIn or GitHub) in the top third of the document. ATS systems that fail to parse contact details will reject or de-prioritise the resume before a human ever sees it.

Limitation: We cannot verify that the phone or email are reachable, only that they are machine-readable.

Required sections

Section segmentation (L8)

The pipeline identifies heading lines and groups paragraphs under one of eight canonical resume sections: Contact, Summary, Experience, Education, Skills, Projects, Certifications, and Awards. A missing section is flagged because most ATS parsers use these labels as structured field targets.

Limitation: Section-heading recognition relies on typography heuristics. Highly stylised headings may be missed.

Layout complexity

Column & table detection (L4, L6)

Multi-column layouts, decorative tables, and text-in-image regions confuse ATS parsers that read left-to-right, top-to-bottom. Legible maps the page into a grid and counts columns, table regions, and image blocks.

Limitation: Legible scores layout objectively — a two-column resume gets flagged regardless of visual quality, because many ATS parse it incorrectly.

Font legibility

Layout analysis (L3)

Extremely small fonts (< 8 pt) cause OCR-based ATS to misread characters. Legible measures the minimum body font size across all text blocks.

Limitation: Font size information is only available for text-layer PDFs. Scanned-image PDFs always receive a warning here.

Graphics and images

Image detection (L5)

Photographs, logos, and decorative graphics are invisible to text-based ATS parsers. Any image that appears in the top quarter of the page (where a headshot typically sits) is flagged as a potential ATS blocker.

Limitation: Legible detects image objects in the PDF stream. It cannot determine whether an image contains meaningful text.

Headers and footers

Header / footer detection (L7)

Content placed in PDF header or footer regions is often stripped by ATS systems before the body is parsed. Page numbers in headers are fine; contact details or section headings in footers are high-risk.

Limitation: We detect header/footer regions by vertical position heuristics, not by PDF spec metadata.

Resume length

Strict scoring (L10)

The strict scorer penalises resumes longer than two pages for non-executive roles and resumes shorter than half a page. Both extremes correlate with poor ATS parse quality.

Limitation: Page count is counted from the PDF. Resumes that use very small fonts to fit onto fewer pages may pass this check but fail others.

Text encoding

Extraction (L3)

PDFs using non-standard font encodings produce garbled text when parsed by ATS. Legible compares PyMuPDF and pdfminer extraction outputs — divergence indicates encoding problems.

Limitation: Encoding detection is heuristic. A divergence score below 15% is treated as acceptable.

Ready to see how your resume scores?

Run the diagnostic

What we can’t claim

The 5-stage pipeline

Dual extraction

Layout analysis

Section & field extraction

ATS scoring (strict + lenient)

AI recommendations