Role Summary
Own the end-to-end PCIe system design for an NVMe SSD product line across client laptops and enterprise servers, from PHY/MAC review through ASIC/SoC integration, PCIe SFR/register analysis, and firmware design guidelines for robust link training, link transitions, low-power behavior. This role sits at the intersection of PCIe spec compliance, NVMe behavior, FW architecture, platform interoperability, and power/performance tuning.
Key Responsibilities
· Own system-level PCIe Gen5/Gen6 architecture from an NVMe SSD endpoint perspective
· Define and review PCIe + NVMe integration across SSD products
· PHY + MAC IP review, integration requirements and constraints
· SoC/ASIC integration: clocks, resets, power domains, straps, lane mapping, sidebands
· PCIe SFR + FW guidelines: flow control, LTSSM observability, power states, error handling
· Link & low power transitions: DLRM, L1, L1SS, L0p, ASPM, clock-down, APST Coordination
· Bring-up + debug: enumeration, speed negotiation, width detection, stability, AER/error recovery
· Customer requirement tuning: latency/power, performance, reliability and consistency
· Provide deep expertise in PCIe configuration and extended capability registers, including:
o Link, power management, MSI/MSI-X, AER, BARs, L1SS
· Lead platform bring-up and debug:
o Enumeration, link training, speed negotiation, power states, error handling
· Act as the technical authority for cross-team and customer escalations
Detailed Responsibilities (End-to-End PCIe for NVMe SSD)
1. PHY/MAC IP Review (System Design Perspective)
· Understand criteria for PHY/MAC/controller IP:
o Gen5/Gen6 readiness, equalization capability, margining hooks, lane mapping flexibility
o SRNS/SRIS tolerance, clocking modes, power management support
o Observability: LTSSM state visibility, error counters, replay/NAK stats, equalization telemetry
· Review IP documents:
o Reset sequences, compliance features, link speed change support
o L1SS behavior, CLKREQ#/REFCLK control expectations
o AER robustness, surprise down handling, hot/warm reset behavior
· Specify platform-facing requirements:
o Retimer/redriver compatibility assumptions (backplane/adaptor/cables)
2. ASIC/SoC Integration Ownership
· Integrate PCIe subsystem with:
o Clocking: REFCLK handling, clock request gating, clock-down sequences
o Resets: PERST# behavior, internal resets, warm/hot resets, FLR support as applicable
o Power domains: retention strategies, wake sources, D-state coordination
o Sidebands: WAKE#, CLKREQ#, presence detect patterns (platform dependent)
· Define lane policy:
o x4 typical NVMe, lane reversal/polarity, width detection & recovery from degraded width
3. PCIe SFR / Register + FW Design Guidelines
· Define a clean SFR map that FW uses for:
o LTSSM control/observability (state, substate, timers, retries)
o Link speed/width control and status (negotiated vs target)
o Low-power triggers: ASPM enable/disable, L1SS policy, L0p policy (if implemented)
o Clock request & clock gating behavior (safe entry/exit rules)
o Error logging counters (replay, NAK, ECRC, timeout, malformed TLPs)
o Recovery controls: link disable/enable, retrain, directed speed change, error clear policy
· Provide FW runbooks:
o “What to do when”: training fails, width reduces, speed fallback, AER floods
o Safe sequencing across power modes and APST transitions
4. Link Bring-Up & Transitions (Sequence Ownership)
You’ll own/define the exact sequencing rules for:
· Enumeration readiness
o Ensure config space stability, BAR mapping correctness, MSI/MSI-X readiness timing
· Speed negotiation / Directed Speed Change
o When to allow Gen5/Gen4 fallback; policy for stability vs performance
· Width detection & recovery
o Handling degraded width events (x4 → x2) and reporting/telemetry
· Link power management
o ASPM policy and its constraints with NVMe latency targets
o L1 entry/exit triggers and guard timers
o L1 Substates (L1.1/L1.2) enablement conditions, wake sources, and clock requirements
o DLRM handling (as applicable to platform/system) with safe NVMe readiness on resume
o L0p (if supported) and interaction with performance bursts
· Clock down / clock request
o Define clock request gating conditions, and safe “no transactions in flight” criteria
· NVMe APST alignment
o Coordinate NVMe power states (APST) with PCIe L-states so you don’t create:
§ long resume latencies (client)
§ link instability under load (enterprise)
5. Platform Interoperability
· Own differences across laptop and server:
o Client: aggressive power policies, fast resume, frequent idle entry/exit, D3hot/cold patterns
o Enterprise: stable performance, high queue depth, error containment, hot-plug-ish behaviors on some platforms
· Validate across:
o Multiple root complexes, BIOS implementations, OS stacks