FAR.AI is seeking a Research Lead to develop and lead a research agenda to reduce catastrophic risks from advanced AI. You'll build and lead a team executing this agenda — setting research direction, mentoring Members of Technical Staff to scale your vision, and staying close enough to the work to write code and run experiments yourself when it matters. The aim is research that changes how AI labs and governments behave, not just research that gets published. This role is a strong fit if you want to work in an impact-driven environment with high autonomy, pursuing empirically-grounded, scalable ML safety work.
FAR.AI is a non-profit AI research institute working to ensure advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.
Since our founding in July 2022, we've grown to 40+ staff, published 40+ academic papers, and convened leading AI safety events. Our work is recognized globally, with publications at premier venues such as NeurIPS, ICML, and ICLR, and features in the Financial Times, Nature News and MIT Technology Review. We conduct pre-deployment testing on behalf of frontier developers such as OpenAI and independent evaluations for governments including the EU AI Office. We help steer and grow the AI safety field through developing research roadmaps with renowned researchers such as Yoshua Bengio; running FAR.Labs, an AI safety-focused co-working space in Berkeley housing 40 members; and supporting the community through targeted grants to technical researchers.
We explore promising research directions in AI safety and scale up only those showing a high potential for impact. Once the core research problems are solved, we work to scale them to a minimum viable prototype, demonstrating their validity to AI companies and governments to drive adoption.
Our current research includes:
Adversarial Robustness: working to rigorously solve security problems through building a science of security and robustness for AI, from demonstrating superhuman systems can be vulnerable, to scaling laws for robustness and jailbreaking constitutional classifiers.
Mechanistic Interpretability: finding issues with Sparse Autoencoders, probing deception using AmongUs, understanding learned planning in SokoBan, and interpretable data attribution.
Red-teaming: conducting pre- and post-release adversarial evaluations of frontier models (e.g. Claude 4 Opus, ChatGPT Agent, GPT-5); developing novel attacks to support this work.
Evals: developing evaluations for new threat models, e.g. persuasion and tampering risks.
Mitigating AI deception: studying when lie detectors induce honesty or evasion, and developing approaches to deception and sandbagging.
We are particularly looking to add Research Leads in the following pod shapes:
Applied Interpretability — using interpretability to tackle concrete safety problems (better probes, backdoor detection, deception monitoring), aiming for fast feedback loops, often in collaboration with our other pods. A new pod, greenfield.
Scalable Oversight / Alignment — methods that keep oversight robust as models become more capable than their supervisors: recursive reward modeling, debate, weak-to-strong generalization, process-based supervision. A greenfield area we'd like to stand up.
Adversarial Robustness / Guardrails — extending our independent-testing work into deployed-system protection: better constitutional classifiers, pre-training safety interventions (initially CBRN misuse, especially for open-weight models), backdoor detection and mitigation, realistic cybersecurity evaluations, and loss-of-control deception evaluations.
Auditing / Evals — auditing for alignment, not just capabilities: evaluation awareness (construct validity, safety-relevance, hyper-realistic evals), CoT monitorability and faithfulness training, black-box monitoring as a complement to our existing white-box work.
Persuasion / Epistemic Risks — science of epistemic risks and intervention points, persuasion's role in loss of control risks, evaluations and independent testing, connections to broader harmful manipulation, solutions and epistemic uplift. Part building on our existing work, part shaping your own agenda in the area.
Bring Your Own Agenda — an open track for senior researchers with a strong vision outside the pods above.
Research Leads define and own a research workstream end-to-end. Day-to-day, that means:
Articulate a research agenda with a clear theory of change for mitigating catastrophic risks from human-level or superhuman AI systems, and/or vastly increasing the upside of such systems.
Grow and lead a team of technical staff in pursuit of this agenda, either directly or in partnership with an engineering co-lead.
Lead novel research projects where there may be unclear markers of progress or success.
Share your research findings through written content (e.g. academic publications, blog posts) and presentations (e.g. ML conferences, policymaker briefings) to drive adoption and change.
Mentor and coach junior team members in research skills and ML engineering.