Job Description:
Position Description:
Defines and implements practices in Resiliency Engineers, Automation, Observability and Chaos Tests. Solves stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers. Develops and migrates applications into cloud-based platforms like Amazon Web Services (AWS) and Azure. Builds and supports Kubernetes clusters to run across multiple machines and environments. Monitors data ingestion process and performs on-call duties by utilizing Control-M and Informatica. Troubleshoots application issues on Unix/Linux with J2EE, WebSphere, Tomcat and SQL. Provides technical expertise and assists in crisis management for major system outages or issues with major impact to the business and consults with application support and development groups on application problems, new releases, new applications, systems, and infrastructure.
Primary Responsibilities:
Defines and recommends process improvement initiatives.
Participates in the development of tools used to facilitate operational procedures.
Completes technical service requests.
Drives incident management and resolutions and identifies root cause and coordinates fixes.
Responsible for the creation and documentation of disaster recovery procedures.
Creates contingency plans covering key areas of vulnerability within the system.
Coordinates systems installation and configuration of systems installation and infrastructure changes.
Responsible for the maintenance of system diagrams and operational procedures.
Develops tools to automate the collection and analysis of operational data.
Establishes project plans for projects of moderate scope.
Works on multiple projects concurrently.
Establishes on-time and on-budget project goals.
Maintains status of project and monitors activities of team members.
Provides application support and assists in the identification, isolation, and escalation of problems of limited scope.
Participates in application functionality analysis by gathering appropriate data and presenting it in chart form for analysis.
Participates in the evaluation of design for existing systems to assess their reliability, performance, usage, maintainability, and cost of ownership.
Participates in the development of technical solutions of limited complexity to support delivery of service requests.
Performs functional analysis for small to moderate projects.
Education and Experience:
Bachelor’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Senior Systems Services & Support Analyst (or closely related occupation) implementing, deploying, monitoring, and automating on-premises infrastructure using Cloud Native Services or containerized applications throughout the software development lifecycle (SDLC).
Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and one (1) year of experience as a Senior Systems Services & Support Analyst (or closely related occupation) implementing, deploying, monitoring, and automating on-premises infrastructure using Cloud Native Services or containerized applications throughout the software development lifecycle (SDLC).
Skills and Knowledge:
Candidate must also possess:
Demonstrated Expertise (“DE”) ensuring seamless integration using databases, monitoring, and security features to integrate, deploy, and maintain scalable containerized applications across Amazon Web Services (AWS), Azure, and OpenShift within a cloud-native ecosystem; and managing Amazon Elastic Kubernetes Service (Amazon EKS) using AWS.
DE configuring and maintaining Datadog monitoring and setting up alerts and dashboards to proactively identify issues using Datadog, Catchpoint, Splunk, AppDynamics, and Grafana for Application Observability and monitoring of applications and infrastructure; and, writing SQL stored procedures to interact with Snowflake databases.
DE managing complex job workflows in Control-M and Autosys through holding, releasing, rerunning, and force completing jobs to ensure optimal system performance and meeting service-level agreements; and using Ansible to automate the configuration, deployment, and management of infrastructure, scalability, and efficiency across environments through infrastructure as code.
DE applying DevOps principles throughout the software development life cycle (SDLC) -- performing continuous integration, delivery, and deployments using Jenkins, Jfrog and Sonar; and identifying performance bottlenecks and providing solution improvements for Web applications using Tomcat.
#PE1M2
#LI-DNI
Certifications:
Category:
Information TechnologyFidelity’s hybrid working model blends the best of both onsite and offsite work experiences. Working onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most hybrid roles require associates to work onsite every other week (all business days, M-F) in a Fidelity office.