Staff Software Engineer at a-place-for-mom

The position:

We're looking for a talented Staff Software Engineer to join our Agentic Platform team—the group responsible for building the company-wide infrastructure that enables every engineering team to safely build AI-powered features. This is a rare opportunity to shape foundational AI/LLM platform capabilities from the ground up at a company that's deploying real-world AI agents today.

The Agentic Platform provides shared primitives, hosted agent execution, and operational tooling so any team can build AI-powered workflows—from simple summarization to complex multi-turn conversational agents. Our vision: enable any engineer to build production-ready AI features without becoming an AI expert.

The platform handles the hard infrastructure problems—provider abstraction, safety guardrails, observability, prompt lifecycle management, and evaluation systems—so product teams can focus on their domain logic. You'll be working on:

Agentic Platform SDK: TypeScript SDK with core primitives (Completion, Agent, Tool, Guardrail, PromptPack, Eval, Context)
Agentic Platform Service: Agent-as-a-Service for long-running async tasks
Prompt Management: Externalized, versioned prompt storage with CI/CD Integration
Our production AI-powered voice and chat application—the proving ground for platform patterns

Reporting to the Director of Engineering, you'll partner closely with the Principal Engineer leading platform architecture while collaborating with product teams across the company who will consume your platform. This role is ideal for someone who thrives in technically deep problems, wants to build infrastructure that multiplies the impact of other engineers, and is excited about the rapidly evolving LLM landscape

Who you are:

You're a platform engineer at heart—someone who understands that the best infrastructure is invisible to its users while handling enormous complexity under the hood. You have a track record of designing systems that other engineers love to use, and you know how to balance powerful abstractions with practical simplicity.

You understand that LLM systems present unique challenges: non-deterministic outputs, rapidly evolving provider SDKs, safety requirements, and the need for systematic quality measurement. You're excited to tackle "churn containment"—building stable APIs that absorb the chaos of monthly model releases and quarterly SDK updates.

The ideal candidate is a hands-on, outcome-oriented engineer with extensive experience building platform infrastructure, developer tools, or distributed systems. You demonstrate a strong partnership with internal customers (other engineering teams), ensuring alignment between platform capabilities and their needs.

What you will do:

Design and build core platform primitives including provider abstraction layers (OpenAI, Anthropic, Google), structured output validation, streaming infrastructure, and token management systems
Own safety and compliance infrastructure including composable guardrail systems, PII detection/redaction, audit logging, and privacy-first observability that never leaks sensitive data to third parties
Build evaluation infrastructure that enables systematic quality measurement for non-deterministic LLM outputs—datasets, scorers (exact match, LLM-as-judge, schema validation), CI/CD integration, and regression detection
Lead churn containment strategy—design provider adapters and SDK architecture that absorbs rapidly-changing LLM provider SDKs without breaking consuming applications
Architect prompt lifecycle management systems including version control, Langfuse integration, GitHub-based review workflows, and deployment pipelines
Design Agent-as-a-Service infrastructure for long-running async tasks using AWS EventBridge, DynamoDB, and PostgreSQL
Collaborate with consuming teams to understand their needs, onboard them to the platform, and provide technical support
Influence architecture, technology selections, and engineering standards across the broader organization
Create reference implementations and technical documentation that enables other engineers to successfully adopt the platform
Champion quality engineering practices including comprehensive testing, type safety, and observability

Required Skills & Competencies:

8+ years of software engineering experience with significant time spent building platform infrastructure, developer tools, SDKs, or distributed Systems
Production experience with LLM/AI systems—you've built and operated systems using OpenAI, Anthropic, or similar providers, and understand the unique challenges (token limits, non-determinism, provider outages, model deprecations)
Strong TypeScript expertise—this is our company standard, and you'll be designing APIs that other TypeScript developers consume
Experience designing APIs and abstractions that other engineers love to use—you understand the balance between power and simplicity
Understanding of safety and compliance in AI systems—PII handling, guardrails, audit logging, and responsible AI practices
Experience with event-driven architectures and async processing patterns (EventBridge, SQS, or similar)
Understanding of observability and monitoring for distributed systems—metrics, tracing, alerting, and debugging production issues
Strong communication and technical writing skills—ability to document systems clearly and work with internal customers across multiple teams
Track record of technical leadership without or without formal management—influencing architecture, mentoring engineers, and driving technical decisions
Experience with cloud infrastructure (AWS preferred: Fargate, DynamoDB, RDS, S3, EventBridge)

Preferred:

Experience building SDK or platform products consumed by multiple teams
Experience with prompt engineering, prompt management systems, or LLM evaluation frameworks
Familiarity with NestJS, Prisma, or similar TypeScript backend frameworks
Experience with streaming architectures (SSE, WebSockets) for real-time AI applications
Background in building multi-tenant platform infrastructure
Experience with hexagonal architecture / ports and adapters patterns
Contributions to open-source LLM tooling or frameworks

Technical Enviroment:

Languages: TypeScript (primary)
Frameworks: NestJS, OpenAI Agents SDK, Vercel AI SDK
Databases: PostgreSQL (Prisma ORM), DynamoDB, Redis
Infrastructure: AWS (Fargate, EventBridge, S3, Parameter Store), Docker
Observability: Langfuse, NewRelic, Coval
Testing: Vitest
CI/CD: GitHub Actions, SonarQube
LLM Providers: OpenAI, Anthropic (with architecture for additional providers)
Coding Agents: Claude, Codex, Gemini

Compensation:

Base Salary: $160,000 to $190,000 + 10% Bonus
Benefits:
- 401(k) plus match
- Dental insurance
- Health insurance
- Vision Insurance
- Paid Time Off

#LI-KT1

About A Place for Mom

A Place for Mom is the leading platform guiding families through every stage of the aging journey. Together, we simplify the senior care search with free, personalized support — connecting caregivers and their loved ones to vetted providers from our network of 15,000+ senior living communities and home care agencies.

Since 2000, our teams have helped mill

Staff Software Engineer

Job Description

About a-place-for-mom

Similar Jobs

Inbound Sales Representative (Remote/Eastern Time)

Inbound Sales Representative (Remote/Pacific Time)