resumai

Python

An LLM inference pipeline that scores millions of cover letter–job posting pairs for quality and relevance. Traditional text analysis couldn’t judge whether a letter actually addressed a job; this pipeline runs each pair through an open-weight model multiple times, verifies score stability, and scales from a 1K dev set to 5M production pairs.

Paper

Scale

1K dev set → 5M production pairs

Capability

LLM-based relevance scoring at research scale

Reproducibility

Open-weight models, pinned versions

What I built

How I led

Stack

Python
AWS Bedrock
AWS SDK

Key decisions

Open-weight models over commercial APIs — commercial providers retire models without notice, which would make published results irreproducible. Pinning an open-weight model version keeps the analysis stable for years
AWS Bedrock batch inference allowed us to hit a tight data processing timeline