← Back to Legible

How Legible works

Every score is produced by a deterministic, open pipeline. This page explains each stage and every check — including what we can’t claim.

What we can’t claim

  • Legible does not have access to any real ATS system. Scores are based on documented parser behaviours, not live ATS APIs.
  • A high Legible score does notguarantee your resume will pass a specific employer’s ATS configuration.
  • Legible analyses machine-readability, not content quality, keyword relevance, or cultural fit.
  • DOCX files skip layout analysis (DOCX is a flow format with no page coordinates). Scores for DOCX are based on section and field checks only.

The 5-stage pipeline

  1. 1

    Dual extraction

    L3

    Your file is run through two independent engines. For PDFs: PyMuPDF (fitz) for layout-aware block extraction, and pdfminer.six for reading-order plain text. For DOCX: python-docx paragraph + table traversal. Both outputs are kept so downstream steps can cross-check them.

  2. 2

    Layout analysis

    L4 – L7

    The layout map from PyMuPDF is analysed for multi-column regions (L4), inline image objects (L5), table structures (L6), and header/footer bands (L7). Each detection produces a structured annotation used by the scorer.

  3. 3

    Section & field extraction

    L8 – L9

    Paragraph text is segmented into canonical sections by matching heading patterns against a curated regex library (L8). Named entities — email, phone, URLs, degree, school — are then extracted from each section using rule-based NER (L9).

  4. 4

    ATS scoring (strict + lenient)

    L10 – L11

    Two scorers run independently. Strict (L10) simulates an older parser: penalises multi-column layouts, images, small fonts, missing sections. Lenient (L11) simulates a modern NLP-based ATS that can tolerate some layout noise. The score gap reveals parser-dependent risk.

  5. 5

    AI recommendations

    L12

    A ranking pass sorts all score deductions by points_lost. The top five are expanded into concrete fix instructions by a small LLM prompt (or a rule-based fallback when no API key is configured). Each recommendation includes an estimated points_gain if the fix is applied.

Individual checks

Each check below is directly linked from the diagnostic report. Anchor URLs follow the pattern /legible/methodology#check-{id}.

Contact information

Field extraction (L9)

Legible looks for a valid email address, phone number, and at least one professional URL (LinkedIn or GitHub) in the top third of the document. ATS systems that fail to parse contact details will reject or de-prioritise the resume before a human ever sees it.

Limitation: We cannot verify that the phone or email are reachable, only that they are machine-readable.

Required sections

Section segmentation (L8)

The pipeline identifies heading lines and groups paragraphs under one of eight canonical resume sections: Contact, Summary, Experience, Education, Skills, Projects, Certifications, and Awards. A missing section is flagged because most ATS parsers use these labels as structured field targets.

Limitation: Section-heading recognition relies on typography heuristics. Highly stylised headings may be missed.

Layout complexity

Column & table detection (L4, L6)

Multi-column layouts, decorative tables, and text-in-image regions confuse ATS parsers that read left-to-right, top-to-bottom. Legible maps the page into a grid and counts columns, table regions, and image blocks.

Limitation: Legible scores layout objectively — a two-column resume gets flagged regardless of visual quality, because many ATS parse it incorrectly.

Font legibility

Layout analysis (L3)

Extremely small fonts (< 8 pt) cause OCR-based ATS to misread characters. Legible measures the minimum body font size across all text blocks.

Limitation: Font size information is only available for text-layer PDFs. Scanned-image PDFs always receive a warning here.

Graphics and images

Image detection (L5)

Photographs, logos, and decorative graphics are invisible to text-based ATS parsers. Any image that appears in the top quarter of the page (where a headshot typically sits) is flagged as a potential ATS blocker.

Limitation: Legible detects image objects in the PDF stream. It cannot determine whether an image contains meaningful text.

Resume length

Strict scoring (L10)

The strict scorer penalises resumes longer than two pages for non-executive roles and resumes shorter than half a page. Both extremes correlate with poor ATS parse quality.

Limitation: Page count is counted from the PDF. Resumes that use very small fonts to fit onto fewer pages may pass this check but fail others.

Text encoding

Extraction (L3)

PDFs using non-standard font encodings produce garbled text when parsed by ATS. Legible compares PyMuPDF and pdfminer extraction outputs — divergence indicates encoding problems.

Limitation: Encoding detection is heuristic. A divergence score below 15% is treated as acceptable.

Ready to see how your resume scores?

Run the diagnostic