This role sits at the center of hardware engineering and operations. You’ll work with datacenter technicians, platform teams, firmware engineers, and suppliers to improve reliability across the entire lifecycle of the machine.

What you’ll do

Investigate hardware failures across servers, racks, networking, and storage systems.
Analyze fleet telemetry, repair data, crash signatures, and environmental trends to identify reliability risks.
Drive root-cause analysis from field symptom to component-level understanding.
Improve repair workflows, diagnostics, and serviceability for production infrastructure.
Partner with vendors and internal hardware teams on corrective actions and design improvements.
Help define reliability standards for future hardware platforms.

Requirements

Experience supporting production hardware systems at scale.
Strong analytical and debugging skills across complex infrastructure systems.
Ability to reason about hardware failures statistically and operationally.
Clear written communication and disciplined incident analysis habits.

Nice to have

Experience with FRACAS, RMA analysis, or hardware lifecycle programs.
Familiarity with Linux diagnostics and fleet observability systems.
Background in cloud infrastructure or hyperscale datacenter environments.
Experience with reliability engineering or manufacturing quality systems.

Datacenter Hardware Engineer, Fleet Reliability

What you’ll do

Requirements

Nice to have

Apply for this role