Principal Engineer - High-Performance AI Infrastructure

Company: Diversity Talent Scouts
Location: San Jose
Posted on: February 15, 2026

Job Description:

Job Description Job Description As a Principal Engineer for HPC and AI Infrastructure , you’ll take a lead role in designing the low-level systems that maximize GPU utilization across large, mission-critical workloads. Working within our GPU Runtime & Systems team, you’ll focus on device drivers, kernel-level optimizations, and runtime performance to ensure GPU clusters deliver the highest throughput, lowest latency, and greatest reliability possible. Your work will directly accelerate workloads across deep learning, high-performance computing, and real-time simulation. This position sits at the intersection of systems programming, GPU architecture, and HPC-scale computing —a unique opportunity to shape infrastructure used by developers and enterprises worldwide. Key Responsibilities Build and optimize device drivers and runtime components for GPUs and high-speed interconnects. Collaborate with kernel and platform teams to design efficient memory pathways (pinned memory, peer-to-peer, unified memory). Improve data transfers across NVLink, InfiniBand, PCIe, and RDMA to reduce latency and boost throughput. Enhance GPU memory operations with NUMA-aware strategies and hardware-coherent optimizations. Implement telemetry and observability tools to monitor GPU performance with minimal runtime overhead. Contribute to internal debugging/profiling tools for GPU workloads. Mentor engineers on best practices for GPU systems development and participate in peer design/code reviews. Stay ahead of evolving GPU and interconnect architectures to influence future infrastructure design. Minimum Qualifications Bachelor’s degree in a technical field (STEM), with 10 years in systems programming, including 5 years in GPU runtime or driver development. Experience developing kernel-space modules or runtime libraries (CUDA, ROCm, OpenCL). Deep familiarity with NVIDIA GPUs, CUDA toolchains, and profiling tools (Nsight, CUPTI, etc.). Proven ability to optimize workloads across NVLink, PCIe, Unified Memory, and NUMA systems. Hands-on background in RDMA, InfiniBand, GPUDirect, and related communication frameworks (UCX). Strong C/C++ programming skills with systems-level expertise (memory management, synchronization, cache coherency). Preferred Qualifications Expertise in HPC workload optimization and GPU compute/memory tradeoffs. Knowledge of pinned memory, peer-to-peer transfers, zero-copy, and GPU memory lifetimes. Strong grasp of multithreaded and asynchronous programming patterns. Familiarity with AI frameworks (PyTorch, TensorFlow) and Python scripting. Understanding of low-level CUDA/PTX assembly for debugging or performance tuning. Experience with storage offloads (NVMe, IOAT, DPDK) or DMA-based acceleration. Proficiency with system profiling/debugging tools (Valgrind, cuda-memcheck, gdb, Nsight Compute/Systems, perf, eBPF). An advanced degree (PhD) with research in GPU systems, compilers, or HPC is a plus.

Keywords: Diversity Talent Scouts, Fremont , Principal Engineer - High-Performance AI Infrastructure, IT / Software / Systems , San Jose, California

Didn't find what you're looking for? Search again!

Let San Jose recruiters find you. Post your resume for free!

Get San Jose IT / Software / Systems jobs via email.

View more Fremont IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Senior SAP Application Consultant, Finance (FICO)
Description: As a Senior SAP Application Consultant, Finance you will be expected to leverage your business process, industry expertise, and SAP Finance solution knowledge to lead clients on their SAP implementation (more...)
Company: NTT Data
Location: Campbell
Posted on: 02/11/2026

Software Engineer III, AI/ML, Google Cloud AI
Description: Minimum qualifications: Bachelor s degree or equivalent practical experience. 2 years of experience programming in Python or C . 1 year of experience with one or more of the following: Speech/audio (more...)
Company: Google
Location: Cupertino
Posted on: 02/12/2026

Software Engineer III, Infrastructure, Google Cloud NetInfra
Description: Minimum qualifications: Bachelor s degree or equivalent practical experience. 2 years of experience programming in C , Python or Go. 2 years of experience with developing large - scale infrastructure, (more...)
Company: Google
Location: Cupertino
Posted on: 02/12/2026

Salary in Fremont, California Area | More details for Fremont, California Jobs |Salary

Senior Technical Services Developer
Description: Duties Develop, test, and deliver integrations and customizations with the Sage Intacct Platform. Provide technical expertise on sophisticated implementation projects. Understand customer (more...)
Company: SAGE GROUP PLC
Location: Campbell
Posted on: 02/11/2026

Data Center Remote Hands Engineer
Description: Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace (more...)
Company: NTT Data
Location: Campbell
Posted on: 02/11/2026

Dynamics 365 CE Junior Developer
Description: Job Family: SAAS/PAAS/Cloud Consulting Travel Required: Up to 25 Clearance Required: None What You Will Do: We are looking for a hands-on software engineer with deep knowledge of Microsoft Power Apps, (more...)
Company: Guidehouse
Location: Sacramento
Posted on: 02/11/2026

Expert Program Manager, Insights and Intelligence - Location Flexible
Description: Requisition ID 169836 Job Category: Project / Program Management Job Level: Individual Contributor Business Unit: Electric Engineering Work Type: Hybrid Job Location: Oakland Alameda Alta American (more...)
Company: PG&E Corporation
Location: Oakland
Posted on: 02/12/2026

Software Engineer III, AI/ML, Google Ads
Description: Minimum qualifications: Bachelor s degree or equivalent practical experience. 2 years of experience programming in Python or C . 1 year of experience with one or more of the following: Speech/audio (more...)
Company: Google
Location: Mountain View
Posted on: 02/12/2026

Senior Analog & Mixed-Signal Layout Engineer
Description: Job Description Job Description The senior analog amp mixed-signal layout designer will be responsible for the layout of cutting edge, high performance, high speed CMOS data converters in foundry CMOS (more...)
Company: Omni Design Technologies
Location: Milpitas
Posted on: 02/12/2026

Senior Software Developer in Test, Performance - Remote
Description: Cribl does differently. What does that mean It means we are a serious company that doesn t take itself too seriously and we re looking for people who love to get stuff done, and laugh a bit along (more...)
Company: Cribl
Location: Campbell
Posted on: 02/11/2026

Loading more jobs...

Principal Engineer - High-Performance AI Infrastructure

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account