Staff Site Reliability Engineer, Storage

Company: Crusoe Energy Systems LLC
Location: San Francisco
Posted on: May 18, 2025

Job Description:

Crusoe is building the World's Favorite AI-first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure.About This Role:At Crusoe Energy Systems, our Site Reliability Engineering (SRE) team plays a mission-critical role in maintaining the performance and reliability of our AI-optimized cloud infrastructure. The Storage-focused SRE role is responsible for ensuring the availability, performance, and scalability of Crusoe's cloud storage products and services, which power compute-intensive, latency-sensitive workloads for AI and HPC use cases. This role directly supports our vertically integrated, sustainable cloud platform by building and optimizing distributed, fault-tolerant storage systems at scale.What You'll Be Working On:In this role, you will build automation and self-healing tools to monitor and maintain Crusoe's distributed cloud storage infrastructure, which includes block, file, and object storage systems. You will drive reliability initiatives focused on data replication, encryption, backup and restore strategies, and robust failover mechanisms. Collaborating closely with storage engineers, you will help implement and maintain high-performance NVMe- and SSD-backed volumes that support large-scale AI compute clusters. Your responsibilities will also include supporting user-facing storage services with a focus on availability, performance tuning, and adherence to error budgets. You'll investigate and resolve storage-related incidents using deep telemetry, logs, and performance profiling, while also partnering with hardware and kernel teams to diagnose low-level I/O issues and optimize I/O paths, cache policies, and file systems. Additionally, you will contribute to the architecture of fault-tolerant, scalable storage backends tailored for AI-first cloud environments.
What You'll Bring to the Team:

8+ years of professional experience in Storage SRE, systems, or storage engineering.
Hands-on experience with distributed storage systems (e.g., Ceph, GlusterFS, OpenEBS) and deep understanding of object, block, and file storage paradigms.
Proficiency in a programming language such as, Go, Python, Java, or C.
Experience with Infrastructure as Code and deployment tooling such as Terraform, Ansible, or Puppet.
Deep knowledge of Linux internals with a focus on I/O subsystems, memory management, and storage scheduling.
Familiarity with storage protocols like NFS, SMB, iSCSI, or NVMe-oF.
Strong experience working with containerized workloads and orchestration platforms (e.g., Kubernetes, Docker).
Excellent incident response, troubleshooting, and documentation practices.
Experience with building and operating managed services at scale such as object, file and block storage (AWS, GCP, Azure)
Excellent communication skills
Must be able to pass a background check
Embody the Company Benefits:
- Hybrid work schedule
- Industry competitive pay
- Restricted Stock Units in a fast growing, well-funded technology company
- Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
- Employer contributions to HSA accounts
- Paid Parental Leave
- Paid life insurance, short-term and long-term disability
- Teladoc
- 401(k) with a 100% match up to 4% of salary
- Generous paid time off and holiday schedule
- Cell phone reimbursement
- Tuition reimbursement
- Subscription to the Calm app
- MetLife Legal
- Company paid commuter benefit; $50 per pay periodCompensation Range:Compensation will be paid up to $250,000 per year + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
  #J-18808-Ljbffr

Keywords: Crusoe Energy Systems LLC, Fremont , Staff Site Reliability Engineer, Storage, Engineering , San Francisco, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let San Francisco recruiters find you. Post your resume for free!

Get San Francisco Engineering jobs via email.

View more Fremont Engineering jobs

Other Engineering Jobs

IT Infrastructure Engineer El Segundo, CA
Description: Company OverviewMillennium Space Systems, A Boeing Company delivers affordable, high-performance space systems for exacting customers. At Millennium, you will be part of a close-knit team working on exciting (more...)
Company: OptiRoi Media
Location: Palo Alto
Posted on: 05/19/2025

Staff R&D Engineer, Instruments
Description: Calyxo, Inc. is a medical device company headquartered in Pleasanton, California, USA. The company was founded in 2016 to address the profound need for improved kidney stone treatment. Kidney stone disease (more...)
Company: Calyxo, Inc.
Location: Pleasanton
Posted on: 05/19/2025

Staff Functional Safety Engineer
Description: About UsRivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive's next chapter. From operating systems to zonal controllers to cloud (more...)
Company: Rivian and Volkswagen Group Technologies
Location: Palo Alto
Posted on: 05/19/2025

Salary in Fremont, California Area | More details for Fremont, California Jobs |Salary

growth engineer/hacker
Description: .StackTypeScript, React, Next.js, Tailwind, SEO frameworks, analytics tools, A/B testing infrastructure. br br Who You AreMinimum ul li You have 3 years of experience in frontend engineering (more...)
Company: wordware (YC S24)
Location: San Francisco
Posted on: 05/17/2025

Project Engineer or Geologist
Description: Citadel Environmental Services, Inc. DBA: Citadel EHS seeks a full-time Project Engineer to support and manage high-profile Engineering and Environmental Science related projects, including subsurface (more...)
Company: Citadelenvironmental
Location: Walnut Creek
Posted on: 05/19/2025

Senior Field Applications Engineer w/ RDMA #R022847
Description: Senior Field Applications Engineer w/ RDMA R022847 br br If the following job requirements and experience match your skills, please ensure you apply promptly. br br Technical lead for Global (more...)
Company: OSI Engineering
Location: San Jose
Posted on: 05/17/2025

Packaging Engineer
Description: br br br br br br br br Packaging Engineer br br br br Job Locations br br US-CA-Santa Rosa br br br br br br Requisition ID br br 2025-3770 br br br (more...)
Company: Amy's Kitchen
Location: Santa Rosa
Posted on: 05/17/2025

Sr. Network Engineer - Clearance Required
Description: Job ID: 675614BR br Date posted: May. 07, 2025 br br Description:The Coolest Jobs are at Lockheed Martin Space br br Are you looking for an exciting career at one of the top Aerospace and Defense (more...)
Company: Lockheed Martin
Location: Sunnyvale
Posted on: 05/17/2025

Staff Machine Learning Engineer, Ads Intelligence
Description: About Pinterest:Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a (more...)
Company: Pinterest
Location: Palo Alto
Posted on: 05/19/2025

Principal Application Engineer - PowerArtist (Remote - PST preferred)
Description: Requisition : 16199 br br Our Mission: Powering Innovation That Drives Human Advancement br br When visionary companies need to know how their world-changing ideas will perform, they close the (more...)
Company: Ansys
Location: San Jose
Posted on: 05/17/2025

Loading more jobs...

Staff Site Reliability Engineer, Storage

Didn't find what you're looking for? Search again!

Other Engineering Jobs

Log In or Create An Account