Siberian Syndicate
← All open roles

Open position

Datacenter Hardware Engineer, Fleet Reliability

Keep thousands of machines healthy, observable, and repairable at scale

This role sits at the center of hardware engineering and operations. You’ll work with datacenter technicians, platform teams, firmware engineers, and suppliers to improve reliability across the entire lifecycle of the machine.

What you’ll do

  • Investigate hardware failures across servers, racks, networking, and storage systems.
  • Analyze fleet telemetry, repair data, crash signatures, and environmental trends to identify reliability risks.
  • Drive root-cause analysis from field symptom to component-level understanding.
  • Improve repair workflows, diagnostics, and serviceability for production infrastructure.
  • Partner with vendors and internal hardware teams on corrective actions and design improvements.
  • Help define reliability standards for future hardware platforms.

Requirements

  • Experience supporting production hardware systems at scale.
  • Strong analytical and debugging skills across complex infrastructure systems.
  • Ability to reason about hardware failures statistically and operationally.
  • Clear written communication and disciplined incident analysis habits.

Nice to have

  • Experience with FRACAS, RMA analysis, or hardware lifecycle programs.
  • Familiarity with Linux diagnostics and fleet observability systems.
  • Background in cloud infrastructure or hyperscale datacenter environments.
  • Experience with reliability engineering or manufacturing quality systems.

Apply for this role