Preference Model is building the next generation of training data to power the future of AI.
Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models for are outside of their training data distribution. Preference Model creates reinforcement learning environments that encapsulate real-world use cases, enabling AI systems to practice, adapt, and learn from feedback grounded in reality. We seek to bring the real world into distribution for the models.
Our founding team has previous experience in building the data infrastructure, tokenizers behind the Claude model. We are partnering with leading AI labs to push frontier of capabilities.
Models of the future will be able to train themselves on tasks that they are not good at. We’re interested in investigating how far we can push the boundaries of self-directed learning. We’re looking for people to push the frontier of post-training on large language models in a role that blends research and engineering, requiring you to implement novel approaches and shape research directions.
Architect and optimize core reinforcement learning infrastructure, from clean training abstractions to distributed experiment management. Help scale our systems to handle increasingly complex research workflows.
Design, implement, and test training environments, evaluations, and methodologies for RL agents.
Drive performance improvements through profiling, optimization, and benchmarking. Implement efficient caching and debug distributed systems to accelerate training and evaluation.
Collaborate across research and engineering teams to develop automated testing frameworks, design clean APIs, and build scalable infrastructure that accelerates AI research.
Are proficient in Python and PyTorch or Jax
Have industry experience in training and doing ML research on LLMs
Can balance research exploration with engineering implementation
Enjoy pair programming and care about code quality, testing, and performance
Have strong systems design and communication skills
Have good understanding of RL algorithms and follow current publications
have experience with LLM agent designs
have worked with virtualization and sandboxed code execution environments
know Kubernetes
have experience in distributed systems or high-performance computing
Candidates don't need a PhD or extensive publications. Some of the best researchers have no formal ML training and gained experience building industry products. We believe adaptability combined with exceptional communication and collaboration skills are the most important ingredients for successful startup research.
We are backed by a Tier 1 VC. We offer competitive base salary as well as generous equity (>90th percentile).
Preference%20model
https://preference%20model.com