Senior Compute Site Reliability Engineer (GPU)
Apple
Summary
Description
Minimum Qualifications
- 5+ years in a Site Reliability Engineering, DevOps, or Infrastructure focused role
- Proven experience with GPU-based virtual machine infrastructure and cloud platforms (e.g., AWS, GCP).
- Experience with GPU hardware (e.g., NVIDIA, AMD) and associated software stack (e.g., CUDA, cuDNN).
- Experience with GitOps, CI/CD tools, and deployment strategies like Spinnaker, Argo
- Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus
- Outstanding organizational and communications skills
- BS/MS degree (Engineering or Computer Science) or equivalent work experience
Key Qualifications
Preferred Qualifications
- Strong verbal and written communication skills
- Knowledge of Kubernetes, including deployment, management, and optimization of clusters.
- Automation advocate - you truly believe in removing operational load via software.
- A strong sense of ownership. At the same time, you're a great teammate who communicates clearly and transparently - Self-motivated, inquisitive, and always looking to learn more.
- Experience managing, scaling, and troubleshooting Golang and GPU applications.
- Ability to work independently and manage multiple priorities effectively.
- CNCF Kubernetes Administration certification
Education & Experience
Additional Requirements
Pay & Benefits
Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.