The position:
We're looking for a talented Staff Software Engineer to join our Agentic Platform teamâthe group responsible for building the company-wide infrastructure that enables every engineering team to safely build AI-powered features. This is a rare opportunity to shape foundational AI/LLM platform capabilities from the ground up at a company that's deploying real-world AI agents today.
The Agentic Platform provides shared primitives, hosted agent execution, and operational tooling so any team can build AI-powered workflowsâfrom simple summarization to complex multi-turn conversational agents. Our vision: enable any engineer to build production-ready AI features without becoming an AI expert.
The platform handles the hard infrastructure problemsâprovider abstraction, safety guardrails, observability, prompt lifecycle management, and evaluation systemsâso product teams can focus on their domain logic. You'll be working on:
Agentic Platform SDK: TypeScript SDK with core primitives (Completion, Agent, Tool, Guardrail, PromptPack, Eval, Context)
Agentic Platform Service: Agent-as-a-Service for long-running async tasks
Prompt Management: Externalized, versioned prompt storage with CI/CD Integration
Our production AI-powered voice and chat applicationâthe proving ground for platform patterns
Reporting to the Director of Engineering, you'll partner closely with the Principal Engineer leading platform architecture while collaborating with product teams across the company who will consume your platform. This role is ideal for someone who thrives in technically deep problems, wants to build infrastructure that multiplies the impact of other engineers, and is excited about the rapidly evolving LLM landscape
Who you are:
You're a platform engineer at heartâsomeone who understands that the best infrastructure is invisible to its users while handling enormous complexity under the hood. You have a track record of designing systems that other engineers love to use, and you know how to balance powerful abstractions with practical simplicity.
You understand that LLM systems present unique challenges: non-deterministic outputs, rapidly evolving provider SDKs, safety requirements, and the need for systematic quality measurement. You're excited to tackle "churn containment"âbuilding stable APIs that absorb the chaos of monthly model releases and quarterly SDK updates.
The ideal candidate is a hands-on, outcome-oriented engineer with extensive experience building platform infrastructure, developer tools, or distributed systems. You demonstrate a strong partnership with internal customers (other engineering teams), ensuring alignment between platform capabilities and their needs.
What you will do:
Design and build core platform primitives including provider abstraction layers (OpenAI, Anthropic, Google), structured output validation, streaming infrastructure, and token management systems
Own safety and compliance infrastructure including composable guardrail systems, PII detection/redaction, audit logging, and privacy-first observability that never leaks sensitive data to third parties
Build evaluation infrastructure that enables systematic quality measurement for non-deterministic LLM outputsâdatasets, scorers (exact match, LLM-as-judge, schema validation), CI/CD integration, and regression detection
Lead churn containment strategyâdesign provider adapters and SDK architecture that absorbs rapidly-changing LLM provider SDKs without breaking consuming applications
Architect prompt lifecycle management systems including version control, Langfuse integration, GitHub-based review workflows, and deployment pipelines
Design Agent-as-a-Service infrastructure for long-running async tasks using AWS EventBridge, DynamoDB, and PostgreSQL
Collaborate with consuming teams to understand their needs, onboard them to the platform, and provide technical support
Influence architecture, technology selections, and engineering standards across the broader organization
Create reference implementations and technical documentation that enables other engineers to successfully adopt the platform
Champion quality engineering practices including comprehensive testing, type safety, and observability
Required Skills & Competencies:
8+ years of software engineering experience with significant time spent building platform infrastructure, developer tools, SDKs, or distributed Systems
Production experience with LLM/AI systemsâyou've built and operated systems using OpenAI, Anthropic, or similar providers, and understand the unique challenges (token limits, non-determinism, provider outages, model deprecations)
Strong TypeScript expertiseâthis is our company standard, and you'll be designing APIs that other TypeScript developers consume
Experience designing APIs and abstractions that other engineers love to useâyou understand the balance between power and simplicity
Understanding of safety and compliance in AI systemsâPII handling, guardrails, audit logging, and responsible AI practices
Experience with event-driven architectures and async processing patterns (EventBridge, SQS, or similar)
Understanding of observability and monitoring for distributed systemsâmetrics, tracing, alerting, and debugging production issues
Strong communication and technical writing skillsâability to document systems clearly and work with internal customers across multiple teams
Track record of technical leadership without or without formal managementâinfluencing architecture, mentoring engineers, and driving technical decisions
Experience with cloud infrastructure (AWS preferred: Fargate, DynamoDB, RDS, S3, EventBridge)
Preferred:
Experience building SDK or platform products consumed by multiple teams
Experience with prompt engineering, prompt management systems, or LLM evaluation frameworks
Familiarity with NestJS, Prisma, or similar TypeScript backend frameworks
Experience with streaming architectures (SSE, WebSockets) for real-time AI applications
Background in building multi-tenant platform infrastructure
Experience with hexagonal architecture / ports and adapters patterns
Contributions to open-source LLM tooling or frameworks
Technical Enviroment:
Languages: TypeScript (primary)
Frameworks: NestJS, OpenAI Agents SDK, Vercel AI SDK
Databases: PostgreSQL (Prisma ORM), DynamoDB, Redis
Infrastructure: AWS (Fargate, EventBridge, S3, Parameter Store), Docker
Observability: Langfuse, NewRelic, Coval
Testing: Vitest
CI/CD: GitHub Actions, SonarQube
LLM Providers: OpenAI, Anthropic (with architecture for additional providers)
Coding Agents: Claude, Codex, Gemini
Compensation:
Base Salary: $160,000 to $190,000 + 10% Bonus
Benefits:
401(k) plus match
Dental insurance
Health insurance
Vision Insurance
Paid Time Off
#LI-KT1
About A Place for Mom
A Place for Mom is the leading platform guiding families through every stage of the aging journey. Together, we simplify the senior care search with free, personalized support â connecting caregivers and their loved ones to vetted providers from our network of 15,000+ senior living communities and home care agencies.
Since 2000, our teams have helped mill
a-place-for-mom