Staff Site Reliability Engineer, Engineering Tools
Tesla
This Staff Site Reliability Engineer will be responsible for managing & maintaining critical engineering tools like GitHub, Bitbucket, SVN & Perforce for version control, Jira & Confluence for project tracking, Polarion for requirements management, and Artifactory for software artifact storage. The ideal candidate will have a strong background in both software engineering & systems administration, as well as a passion for automating & optimizing processes; their work will be instrumental in ensuring the reliability, scalability, and performance of our development capabilities across internal organizations.
- Design, implement, and maintain automation solutions for provisioning, configuration, and monitoring of engineering tools infrastructure
- Administer & support Atlassian application stack (Jira, Confluence), ultimately remaining accountable for the high availability of our infrastructure
- Administer Polarion, including configuration, OSLC plugin integration, workflows, reports, templates, access permissions, re-indexing, and restoration processes; work with users to address any issues or concerns promptly
- Restore projects, work items, and live documents from SVN repository
- Collaborate with development and operations teams to ensure seamless integration and functionality of engineering tools within our CI/CD pipelines
- Perform regular backups, upgrades, and patch management to ensure security & stability
- Rapidly troubleshoot and resolve critical issues by identifying root causes across multiple layers (storage, OS, network, virtualization, & application/DB stack)
- Conduct performance analysis & capacity planning to prevent service disruptions, anticipate future resource requirements, and optimize infrastructure
- Participate in on-call rotation and respond to incidents in a timely manner, resolving issues to minimize downtime & impact on users
- Experience with the installation, configuration, development, debugging, support and upgrades of Github Enterprise
- Proficient in setting up, managing & automating Jira projects, Confluence Spaces, and permissions
- Experience with setting up & maintaining Polarion in High Availability mode, as well as configuring templates, workflows, and permissions within the platform
- Experience with general programming/scripting languages (Python, Shell, Golang) & automation frameworks (Ansible) to manage the administration, monitoring and development of custom plug-ins & workflows
- Knowledge of containerization technologies like Docker & orchestration tools like Kubernetes
- Familiarity with monitoring & logging solutions such as Prometheus, Grafana and Splunk
- Bachelor's Degree in Computer Science, Computer Engineering, Information Technology, or proof of exceptional skills in related field