This role will be based in Mountain View, CA.
At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.
HALO (Human Judgment, Annotation, Localization, and Operations) is a horizontal team within Core AI that partners across the company to enable high-quality human judgment for AI development. We partner closely with cross-functional stakeholders and internal teams to define quality goals, design evaluation and data pipelines, and scale repeatable measurement systems. Our work spans multiple initiatives at once, supported by shared standards, platforms, and best practices that help teams move faster without compromising quality.
Role Summary
AI is evolving rapidly and high-performing teams win by defining quality clearly, building reliable ground truth, and scaling human judgment without slowing innovation. HALO makes that possible.
The AI Linguist plays a key role in shaping the quality of LinkedInās AI systems across a wide range of use cases, experiences, and product areas, including but not limited to relevance, ranking, rationale quality, and emerging multi-step and agentic capabilities. This role turns ambiguity into clear evaluation standards by designing annotation tasks and rubrics, producing high-quality ground truth through hands-on annotation, and building frameworks that make AI systems, models, and agents trainable, measurable, and continuously improvable.
The role also drives scalable annotation and evaluation pipelines, conducts audits of internal and vendor-produced work, and helps ensure strong inter-annotator agreement and consistently high quality. Working at the intersection of human expertise and modern AI tooling, the AI Linguist applies methods such as LLM-assisted prompting, hybrid labeling and evaluation, automated checks, regression testing, continuous monitoring to support evolving business and product needs.
The datasets, rubrics, and evaluation signals produced in this role become shared standards across LinkedIn, directly influencing how AI systems are trained, evaluated, and improved. This is a great opportunity for someone who thrives in ambiguity, builds frameworks that others depend on, and wants to shape how AI performs in the real world.
Key Responsibilities
Partner cross-functionally with Engineering, Product, Data Science, domain SMEs, Trust/Legal, TPM, and vendor ops to align on quality goals, tradeoffs, and delivery plans
Define measurable quality criteria for ambiguous behaviors (rubrics, rating scales, concepts, failure modes) and ensure consistency across markets
Design and run repeatable evaluation systems (metrics, scorecards, regression sets, monitoring plans), including multi-step/agentic behavior evaluation using scenario suites and success criteria
Build scalable, high-quality annotation/evaluation pipelines, including hands-on execution of annotation task, covering task design, sampling, QA gates, adjudication, maintenance) on vendor and/or in-house platforms
Lead vendor and internal workforce execution at scale, owning training and onboarding, calibration sessions, periodic reviews/audits of internal Linguistsā and vendorsā annotation output to measure and improve inter-annotator agreement, quality escalations and adjudication, and ongoing cost/quality tradeoffs to consistently maintain high annotation quality
Establish and enforce quality governance (agreement targets, drift/bias checks, defect taxonomy)
Leverage AI tools to scale work (LLM-assisted prompting, hybrid labeling/evaluation, automated checks) while maintaining reliability controls
Run method/workflow experiments; document results and drive decisions based on evidence
Perform error analysis and drive iteration cycles with partners; translate findings into actionable changes
Define platform/tool requirements for human judgment workflows; partner through build/test/deploy and adoption
Publish reusable best practices and standards; mentor junior Linguists and conduct design/analysis reviews across initiatives