Service Reliability Engineer (SRE)
- Job Category: Engineering/Technical
- Education: Bachelors Degree
- FT/PT: Full-Time
- Company: CGI
Position ID: J0321-3225
CGI has an immediate need for a Service Reliability Engineer to join our financial services team in Knoxville, TN. This is an exciting opportunity to work in a fast-paced team environment supporting one of the largest leaders in the secondary mortgage industry. We take an innovative approach to supporting our client, working side-by-side in an agile environment using emerging technologies.
We partner with 15 of the top 20 banks globally, and our top 10 banking clients have worked with us for an average of 26 years!
We have over 73,000+ CGI Members in 40 countries and over 5k+ loyal Clients who are leveraging our end-to-end services across the globe
As a valued contributor to our team, you will could be responsible for the design, production, testing, and implementation of software, technology, or processes across multiple projects, programs, or products, as well as create and maintain IT architecture, large-scale data stores, and cloud-based systems.
Your future duties and responsibilities:
You will apply expertise in software and systems engineering to ensure that both of the client's internal critical and externally visible systems meet the appropriate performance needs of our users. You will be expected to:
Drive technical capabilities for increasing SRE value proposition across the enterprise
Enable standardization, and adopt application reliability metrics and improve application health
Serve as a change agent in educating internal and external customers on reliability, promote SRE service capabilities, influence customers in adopting SRE services and best practices, measure and showcase the added value through metrics
Strategize portfolio & program reliability by working with cross-functional IT organizations and build roadmaps to drive reliability into the product
Educate and coach junior resources on application and infrastructure reliability and best practices
Become the go-to person for all technical needs of the SRE group
Required qualifications to be successful in this role:
6+ years of relevant professional experience
Excellent verbal and written communication skills with experience presenting information and/or ideas to an audience in a way that is engaging and easy to understand
Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation
Skilled in cloud technologies and cloud computing to include Amazon Web Services (AWS) offerings, development, and networking platforms
Experience architecting solutions for the design and implementation of applications in the cloud
Knowledge on Cloud technologies and containerization using Docker & Kubernetes,
Experience collaborating cross-functionally on availability / performance issues in order to identify root-cause, determine areas for improvement, and drive those actions to closure through effective solutions
Extensive knowledge of principles, advanced techniques, and theories to suggest and implement solutions on a specific project, program, or product
Influencing skills to include negotiation, persuasion of others, meeting facilitation, and conflict resolution
Ability to identify gaps in the code from a non-functional viewpoint and experience assisting developers to fix the code and promote relevant reliability pattern implementations
Skilled in establishing and maintaining the overall health, availability, performance, resiliency, and capacity of technology products with specific experience in performance engineering
Experience designing, building and implementing necessary dashboards from application and infrastructure health perspectives using tools such as Splunk, Dynatrace, etc. to provide a single pane view of all critical business and operational information to relevant stakeholders
Hands-on experience in python in automating an existing process
Experience in activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, etc.
Excellent understanding and demonstrated experience in the use of DevOps / CICD tools like Jenkins, Terraform, Jules and automated deployment tools
Experience implementing resiliency design pattern frameworks and validation
Experience identifying and selecting strategic options, and identifying resources to meet the defined objectives
Experience advising teams in the writing of Performance and Chaos Engineering strategies and scripts with a strong emphasis on automated deployment, infrastructure automation solutions, and continuous integration & delivery processes
Experience designing & developing highly available systems that utilize load balancing, horizontal scalability, and high availability
Familiarity with Blue Prism, Selenium, or Ansible playbooks and programming languages like Java, Perl, Python or PowerShell scripting and Ansible playbook
Bachelor degree in Computer Science, Information Systems or related field
Relevant certifications such as AWS Certified Solutions Architect, AWS Certified SysOps Administrator, Splunk Certified Developer, Dynatrace, Sun Certified Java Programmer, etc.
500 West Summit Hill Drive
Knoxville TN 37902
Job Link: Apply Here!