How Legible works
Every score is produced by a deterministic, open pipeline. This page explains each stage and every check — including what we can’t claim.
What we can’t claim
- Legible does not have access to any real ATS system. Scores are based on documented parser behaviours, not live ATS APIs.
- A high Legible score does notguarantee your resume will pass a specific employer’s ATS configuration.
- Legible analyses machine-readability, not content quality, keyword relevance, or cultural fit.
- DOCX files skip layout analysis (DOCX is a flow format with no page coordinates). Scores for DOCX are based on section and field checks only.
The 5-stage pipeline
- 1
Dual extraction
L3Your file is run through two independent engines. For PDFs: PyMuPDF (fitz) for layout-aware block extraction, and pdfminer.six for reading-order plain text. For DOCX: python-docx paragraph + table traversal. Both outputs are kept so downstream steps can cross-check them.
- 2
Layout analysis
L4 – L7The layout map from PyMuPDF is analysed for multi-column regions (L4), inline image objects (L5), table structures (L6), and header/footer bands (L7). Each detection produces a structured annotation used by the scorer.
- 3
Section & field extraction
L8 – L9Paragraph text is segmented into canonical sections by matching heading patterns against a curated regex library (L8). Named entities — email, phone, URLs, degree, school — are then extracted from each section using rule-based NER (L9).
- 4
ATS scoring (strict + lenient)
L10 – L11Two scorers run independently. Strict (L10) simulates an older parser: penalises multi-column layouts, images, small fonts, missing sections. Lenient (L11) simulates a modern NLP-based ATS that can tolerate some layout noise. The score gap reveals parser-dependent risk.
- 5
AI recommendations
L12A ranking pass sorts all score deductions by points_lost. The top five are expanded into concrete fix instructions by a small LLM prompt (or a rule-based fallback when no API key is configured). Each recommendation includes an estimated points_gain if the fix is applied.
Individual checks
Each check below is directly linked from the diagnostic report. Anchor URLs follow the pattern /legible/methodology#check-{id}.
Contact information
Field extraction (L9)Legible looks for a valid email address, phone number, and at least one professional URL (LinkedIn or GitHub) in the top third of the document. ATS systems that fail to parse contact details will reject or de-prioritise the resume before a human ever sees it.
Required sections
Section segmentation (L8)The pipeline identifies heading lines and groups paragraphs under one of eight canonical resume sections: Contact, Summary, Experience, Education, Skills, Projects, Certifications, and Awards. A missing section is flagged because most ATS parsers use these labels as structured field targets.
Layout complexity
Column & table detection (L4, L6)Multi-column layouts, decorative tables, and text-in-image regions confuse ATS parsers that read left-to-right, top-to-bottom. Legible maps the page into a grid and counts columns, table regions, and image blocks.
Font legibility
Layout analysis (L3)Extremely small fonts (< 8 pt) cause OCR-based ATS to misread characters. Legible measures the minimum body font size across all text blocks.
Graphics and images
Image detection (L5)Photographs, logos, and decorative graphics are invisible to text-based ATS parsers. Any image that appears in the top quarter of the page (where a headshot typically sits) is flagged as a potential ATS blocker.
Resume length
Strict scoring (L10)The strict scorer penalises resumes longer than two pages for non-executive roles and resumes shorter than half a page. Both extremes correlate with poor ATS parse quality.
Text encoding
Extraction (L3)PDFs using non-standard font encodings produce garbled text when parsed by ATS. Legible compares PyMuPDF and pdfminer extraction outputs — divergence indicates encoding problems.
Ready to see how your resume scores?
Run the diagnostic