RL Environment Reviewer at Preference%20model

About us

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. We are backed by a16z.

About the role

Every RL environment we ship needs to survive a model that is actively trying to game it. A task with a weak grader or an exploitable reward signal is worse than no task at all: it teaches the model to hack rather than reason. We need someone whose full-time job is finding those holes before the model does.

We've learned that domain knowledge alone doesn't make a good reviewer. The people who are best at this have spent time thinking adversarially: designing problems that are hard to game, breaking other people's problems, or researching reward hacking directly.

What you'll do

Review RL environments and training tasks for correctness, robustness, and resistance to reward hacking
Identify ways a model could exploit graders, game evaluation criteria, or shortcut past the intended reasoning
Work directly with environment authors to tighten graders, fix reward signals, and redesign tasks that don't hold up
Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks per month
Advise on grader design during environment planning, before tasks are built, not after

Who we're looking for

You think like an attacker. You've spent real time designing problems that are hard to game, or breaking problems other people thought were solid. You have enough ML knowledge to understand what a model might try, and enough engineering sense to evaluate whether a grader actually tests what it says it tests.

Must have:

Track record of adversarial or constructive problem design: competitive programming problem authoring (ICPC, Codeforces, etc.), CTF challenge design, or similar
Familiarity with RL, reward hacking, and specification gaming (you've read Amodei et al., Krakovna's list, or similar work, and you've thought about it beyond surface level)
Strong Python reading skills
Ability to articulate clearly in writing why a task is broken and what needs to change

Any of these would make you stand out:

Published research on reward hacking, specification gaming, RLHF robustness, or AI safety
Background in security engineering, penetration testing, or red-teaming (with enough ML context to apply that mindset to RL environments)
Experience authoring or reviewing problems for competitive programming contests
You've built automated evaluation systems and know where they break
You've worked on LLM evaluation, benchmarking, or alignment research

How to apply

Send your resume and a short note (2-3 sentences is fine) about a time you broke something that was supposed to be robust, or designed a problem that was hard to game. Links to published problems, research, or writeups are more useful than a long cover letter.

We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.

We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).

RL Environment Reviewer

Job Description

About us

About the role

What you'll do

Who we're looking for

Must have:

Any of these would make you stand out:

How to apply

We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.

We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).

About Preference%20model

Similar Jobs

Research Engineer / Research Scientist

RL Environments Engineer

RL Environment Reviewer

Job Description

About us

About the role

What you'll do

Who we're looking for

Must have:

Any of these would make you stand out:

How to apply

We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).

About Preference%20model

Similar Jobs

Research Engineer / Research Scientist

RL Environments Engineer

We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.

We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).