Apply now »

In your career, let’s prove what’s possible.

At Lam Research, we create equipment that drives technological advancements in the semiconductor industry. Our innovative solutions enable chipmakers to power progress in nearly all aspects of modern life, and it takes each member of our team to make it possible.

Across our organization, our employees come to work and change the world. We take on the toughest challenges with precision and accuracy. We push for the next big semiconductor breakthrough. We lead the way in one of the most critical and fast-moving industries on the planet. And we do it together, with deep connections and limitless collaboration.

The impact we have on the world is made possible by focusing on our people. So we recognize and celebrate our teams’ achievements. We strive to create an inclusive and diverse culture where everyone’s contribution and voice has value. We evaluate and evolve our offerings, so our people receive the support and empowerment to do meaningful things for their lives, careers, and communities.

Because at Lam, we believe that when people are the priority and they’re inspired to unleash the power of innovation for a better world together, anything is possible.

IT Engineer 4

Date: Apr 3, 2026

Location:

Bangalore, IN-Bangalore, IN

Req ID: 191384

Worker Category: Pending Selection

The group you’ll be a part of

The Global Information Systems Group is dedicated to the success of Lam through providing best-in-class and innovative information system solutions and services. Together, we support users globally with data, information, and systems to achieve their business objectives.

The impact you’ll make

We are seeking a HPC Systems Engineer to lead the evaluation, deployment, and ongoing management of our large-scale CPU and GPU-clustered environments. You will be the technical owner for the HPC system lifecycle—from initial hardware planning and installation to advanced performance tuning and troubleshooting. This role is highly collaborative, requiring you to work closely with Networking and Security teams to build a secure, high-speed foundational infrastructure that supports mission-critical research and engineering workloads.

What you’ll do

Cluster Lifecycle Management: Lead the evaluation, planning, configuration, and physical/virtual deployment of multiple large-scale CPU + GPU clusters.
System Administration: Perform expert-level Linux system administration, including kernel tuning, security hardening, and OS lifecycle management (e.g., RHEL, Ubuntu, or Rocky Linux).
Workload Management: Act as the subject matter expert for SLURM, managing complex partitioning, resource quality of service (QoS), and scheduling optimization for mixed workloads.
Infrastructure Design: Architect and build the physical and logical infrastructure for HPC, including high-speed fabric integration (InfiniBand/Ethernet) and power/cooling planning.
Software Stack & Modules: Maintain and curate the HPC application stack using software management tools like LMOD or Tcl Modules, ensuring researchers have access to optimized compilers, libraries (MPI, CUDA), and applications.
GPU Optimization: Spec and tune GPU environments (e.g., NVIDIA H100/B200), focusing on GPUDirect, NVLink topologies, and containerized runtimes like Apptainer/Singularity.
Troubleshooting & Performance: Conduct deep-dive root cause analysis for complex system failures and performance bottlenecks across compute, network, and software layers.
Cross-Functional Leadership: Closely own infrastructure projects by coordinating with Networking (low-latency fabric) and Security (compliance, identity management) to ensure all builds meet enterprise standards.

Who we’re looking for

Experience with GPU-aware MPI implementations and performance profiling tools (e.g., NVIDIA Nsight, Tau).
Knowledge of container orchestration in HPC (e.g., Kubernetes for AI/ML workloads alongside SLURM).
Certifications such as RHCE (Red Hat Certified Engineer) or relevant NVIDIA/InfiniBand technical training.

Preferred qualifications

Education: BS/MS in Computer Science, Electrical Engineering, or a related field.
HPC Experience: 6+ years of hands-on experience managing production-grade HPC clusters.
Scheduler Expertise: Deep proficiency in SLURM administration, including writing custom prolog/epilog scripts and managing GRES (Generic Resources) for GPUs.
Linux Mastery: Advanced knowledge of Linux internals, shell scripting (Bash), and at least one high-level language (Python or Go).
Automation: Extensive experience with configuration management and provisioning tools (e.g., Ansible, Terraform, xCAT, or Warewulf).
Networking: Familiarity with HPC-specific networking such as InfiniBand (NDR/HDR) and RoCE v2.

Our commitment

We believe it is important for every person to feel valued, included, and empowered to achieve their full potential. By bringing unique individuals and viewpoints together, we achieve extraordinary results.

Lam Research ("Lam" or the "Company") is an equal opportunity employer. Lam is committed to and reaffirms support of equal opportunity in employment and non-discrimination in employment policies, practices and procedures on the basis of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex (including pregnancy, childbirth and related medical conditions), gender, gender identity, gender expression, age, sexual orientation, or military and veteran status or any other category protected by applicable federal, state, or local laws. It is the Company's intention to comply with all applicable laws and regulations. Company policy prohibits unlawful discrimination against applicants or employees.

Lam offers a variety of work location models based on the needs of each role. Our hybrid roles combine the benefits of on-site collaboration with colleagues and the flexibility to work remotely and fall into two categories – On-site Flex and Virtual Flex. ‘On-site Flex’ you’ll work 3+ days per week on-site at a Lam or customer/supplier location, with the opportunity to work remotely for the balance of the week. ‘Virtual Flex’ you’ll work 1-2 days per week on-site at a Lam or customer/supplier location, and remotely the rest of the time.

Job Segment: Developer, Electrical Engineering, Open Source, Linux, Computer Science, Technology, Engineering

Apply now »