H-1B Job Board

Finding companies that sponsor visas is a lot of work.
We've made your life easier by compiling top companies and startups that hire foreign nationals.

Site Reliability Engineer

Apple

Apple

Software Engineering
Sunnyvale, CA, USA
Posted on Feb 5, 2025

Summary

Posted:
Weekly Hours: 40
Role Number:200588822
Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products — they create the kind of wonder that’s revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it. Apple's Manufacturing Systems & Infrastructure (MSI) team is responsible for gathering, consolidating and tracking all manufacturing data for Apple’s products and modules worldwide. This data is used throughout the company and the product's lifecycle, from the very beginning, to validate that units being built are fully tested and of high quality before leaving the factory, all of the way through to warranty support for customers. As a Senior Site Reliability Engineer, you will play a critical role in maintaining and enhancing the reliability of our production systems. You will collaborate with engineering teams to design, implement, and monitor infrastructure and services, employing your expertise in automation and performance optimization.

Description

Design, develop, and maintain scalable, reliable, and efficient infrastructure. Implement monitoring, alerting, and logging systems to ensure the health and performance of applications. Automate repetitive tasks and improve system efficiency through scripting and tool development. Collaborate with development teams to improve service reliability and promote best practices in software development and deployment. Conduct root cause analysis of system failures and implement corrective actions to prevent recurrence. Participate in on-call rotations and respond to incidents, minimizing downtime and impact on users. Drive continuous improvement initiatives to enhance system performance, scalability, and reliability. Mentor and provide guidance to junior team members, fostering a culture of learning and innovation.

Minimum Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 3+ years of experience in site reliability engineering, DevOps, or a related field.
  • Strong experience with cloud platforms: AWS, Google Cloud Platform, or Microsoft Azure.
  • Proficiency in infrastructure as code tools: Terraform, Ansible, or CloudFormation.
  • Expertise in containerization and orchestration: Docker, Kubernetes and HELM.
  • Experience with CI/CD pipelines and tools: Jenkins, ArcoCD.
  • Strong scripting and programming skills: Python, Go, Shell, or Ruby.
  • In-depth knowledge of monitoring and observability tools: Prometheus, Grafana, Open Telemetry, Splunk
  • Familiarity with version control systems: Git
  • Solid understanding of Linux/Unix system administration and networking.
  • Excellent problem-solving skills and a proactive approach to incident management.

Key Qualifications

Preferred Qualifications

  • Experience with database management and optimization: MySQL, PostgreSQL, or NoSQL databases like MongoDB and Cassandra.
  • Background in industrial automation, factory operations, or manufacturing processes.
  • Familiarity with industrial protocols (e.g., Modbus, OPC-UA, MQTT) and IoT platforms used in factory automation.
  • Familiarity with predictive maintenance solutions and tools that leverage sensor data and IoT analytics.
  • Strong understanding of real-time data processing and working with time-series databases in industrial contexts.
  • Ability to analyze and optimize latency-sensitive applications in an industrial environment.
  • Understanding of security best practices and tools: Vault, IAM, or security scanning tools.
  • Experience with performance testing and load testing tools: JMeter, Gatling, or Locust.
  • Certification in relevant cloud platforms or technologies.

Education & Experience

Additional Requirements

Pay & Benefits

  • Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.