Reliability Engineer - Distributed Open-Source Systems
Two Sigma
This job is no longer accepting applications
See open jobs at Two Sigma.See open jobs similar to "Reliability Engineer - Distributed Open-Source Systems" Ellis H-1B.- Lead engineering and operational support for multiple large distributed open-source software applications (Elasticsearch, Kafka and Zookeeper), including much of the foundational infrastructure used by the Engineering and Research functions at Two Sigma
- Improve all aspects of software reliability, including better monitoring, alerting and documentation
- Collaborate across infrastructure and development teams to ensure strategic priorities are aligned, fix priority support issues, and improve vital software, tools, and processes
- Collect and analyze metrics from operating systems and applications to assist in performance tuning and fault finding
- Participate in a 24x7 on-call rotation for our hosted services
- Minimum 1 year of experience required; 3-10 years of experience preferred in a similar Site Reliability Engineering (SRE), DevOps, Platform Engineering, Systems Engineering/Administration, or related function
- BS in Computer Science or another highly technical, scientific field
- The ability to apply open-source systems (Elasticsearch, Kafka and Zookeeper) and utilities to provision production systems in a variety of domains, especially for multi-tenant use
- Ability to program (structured and OO) with one or more high-level language (such as Python, Java, C/C++, Go) with a proven track record of automation and an algorithmic approach to solving problems
- In-depth knowledge and experience with on-prem (Linux/Unix) and cloud-based (GCP, AWS, etc.) systems
- Experience with automated configuration management tools such as Ansible, Chef, Puppet, and SaltStack
This job is no longer accepting applications
See open jobs at Two Sigma.See open jobs similar to "Reliability Engineer - Distributed Open-Source Systems" Ellis H-1B.