In your career, let’s prove what’s possible.

At Lam Research, we create equipment that drives technological advancements in the semiconductor industry. Our innovative solutions enable chipmakers to power progress in nearly all aspects of modern life, and it takes each member of our team to make it possible.

Across our organization, our employees come to work and change the world. We take on the toughest challenges with precision and accuracy. We push for the next big semiconductor breakthrough. We lead the way in one of the most critical and fast-moving industries on the planet. And we do it together, with deep connections and limitless collaboration.

The impact we have on the world is made possible by focusing on our people. So we recognize and celebrate our teams’ achievements. We strive to create an inclusive and diverse culture where everyone’s contribution and voice has value. We evaluate and evolve our offerings, so our people receive the support and empowerment to do meaningful things for their lives, careers, and communities.

Because at Lam, we believe that when people are the priority and they’re inspired to unleash the power of innovation for a better world together, anything is possible.


AI Services Technical Lead

Date:  Apr 20, 2026
Location: 

Fremont, CA, US, 94538

Req ID:  197668
Worker Category:  On-site Flex

The impact you’ll make

In this role, you will directly contribute to the reliability, scalability, and operational excellence of Lam’s Enterprise AI services. As a hands-on technical lead, you will modernize AI operations through observability, automation, and strong engineering discipline, helping ensure AI services are resilient, production-ready, and able to scale effectively across the company. Your work will strengthen incident response, improve service health and readiness, and drive continuous improvement in how Enterprise AI services are operated and supported.

What you’ll do

  • Own hands-on technical operations for Enterprise AI services, ensuring platforms are reliable, maintainable, and ready for production scale.
  • Lead incident triage, technical troubleshooting, service restoration, and root cause analysis for complex production issues affecting AI platforms and services.
  • Build and enhance monitoring dashboards, alerting strategies, health checks, and operational views across Azure services using Application Insights, Azure Monitor, Log Analytics, and KQL.
  • Query logs, analyze telemetry, and identify patterns and failure modes to improve detection, response speed, and long-term reliability.
  • Improve operational automation using Python, PowerShell, and AI-driven approaches to reduce manual effort and strengthen AI Ops maturity.
  • Partner with engineering teams to review architecture, improve operability, strengthen release readiness, and drive remediation of recurring reliability and support issues.
  • Develop and maintain runbooks, support procedures, and operational standards that improve L1/L2/L3 effectiveness across internal teams and service partners.
  • Support change and release processes through readiness reviews, production validation, and post-release monitoring, using enterprise workflows and ticketing systems such as Jira and ServiceNow.
  • Ensure operational processes, controls, and artifacts are audit-ready and support enterprise compliance requirements; support BCP/DR readiness through recovery validation, runbook updates, and failover testing.

Who we’re looking for

Minimum Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field.
  • Strong hands-on experience supporting cloud-based production platforms in Microsoft Azure.
  • Experience with Application Insights, Azure Monitor, Log Analytics, and Kusto Query Language (KQL) for troubleshooting, telemetry analysis, and operational monitoring.
  • Strong scripting or automation experience using Python and/or PowerShell.
  • Experience supporting CI/CD pipelines, production releases, and operational readiness practices.
  • Experience leading incident triage, root cause analysis, and direct remediation for complex production issues.
  • Strong communication skills with the ability to translate technical issues into clear updates for engineering teams, stakeholders, and leadership.

Preferred qualifications

  • Experience supporting AI/ML or generative AI platforms, including services built with Azure OpenAI.
  • Experience with Azure API Management and operational support for API-based services.
  • Experience supporting containerized or distributed services, including AKS/Kubernetes.
  • Experience working with enterprise ticketing or ITSM platforms such as Jira and ServiceNow.

Our commitment

 

We believe it is important for every person to feel valued, included, and empowered to achieve their full potential. By bringing unique individuals and viewpoints together, we achieve extraordinary results.

Lam Research ("Lam" or the "Company") is an equal opportunity employer. Lam is committed to and reaffirms support of equal opportunity in employment and non-discrimination in employment policies, practices and procedures on the basis of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex (including pregnancy, childbirth and related medical conditions), gender, gender identity, gender expression, age, sexual orientation, or military and veteran status or any other category protected by applicable federal, state, or local laws. It is the Company's intention to comply with all applicable laws and regulations. Company policy prohibits unlawful discrimination against applicants or employees.

Lam offers a variety of work location models based on the needs of each role. Our hybrid roles combine the benefits of on-site collaboration with colleagues and the flexibility to work remotely and fall into two categories – On-site Flex and Virtual Flex. ‘On-site Flex’ you’ll work 3+ days per week on-site at a Lam or customer/supplier location, with the opportunity to work remotely for the balance of the week. ‘Virtual Flex’ you’ll work 1-2 days per week on-site at a Lam or customer/supplier location, and remotely the rest of the time.

Salary

 

CA San Francisco Bay Area Salary Range for this position: $141,000.00 -  $307,000.00.

 

The above salary range for this position is relevant to applicants that reside or work onsite in the California, San Francisco Bay Area only. Salary offers will depend on factors that include the location you work from, your level, education, training, specific skills, years of experience and comparison to other employees already in this role. Actual salary may vary from salary offered due to numerous factors including but not limited to unpaid time off, unpaid leave, company mandated shutdown, and other relevant factors.

 

Our Perks and Benefits
 

At Lam, our people make amazing things possible. That’s why we invest in you throughout the phases of your life with a comprehensive set of outstanding benefits.

Discover more at Lam Benefits


Nearest Major Market: San Francisco
Nearest Secondary Market: Oakland

Job Segment: Test Engineer, Cloud, Computer Science, Testing, Software Engineer, Engineering, Technology