Reliability Engineer analyzes and evaluates the reliability of products, equipment, components, and processes using engineering methodologies and tools. Develops the methods and measures utilized for reliability analysis based on product specifications, tolerances, or operating standards. Being a Reliability Engineer utilizes analysis techniques like FMEA, fault tree, and root cause analysis to identify problems. Oversees testing activities and reviews results. Additionally, Reliability Engineer creates risk-based failure mitigation plans. Proposes new or revised product designs, manufacturing processes and testing specifications that utilize best practices and will increase reliability. Requires a bachelor's degree in engineering. Typically reports to an engineering manager. The Reliability Engineer work is closely managed. Works on projects/matters of limited complexity in a support role. To be a Reliability Engineer typically requires 0-2 years of related experience. (Copyright 2024 Salary.com)
This is a remote position. However, the candidate must be based in the cities listed on the job posting.
About the Team/Role
The WEX Site Reliability Engineering (SRE) team is looking for individuals passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Platform Reliability organization which supports our internal stakeholders and our Funding Platform teams. As part of the Platform Reliability organization you'll have the opportunity to solve complex challenges and improve the quality of life of our engineering teams as well as our ability to service our customers.
The successful candidate should have a strong aptitude for learning new technologies and the ability to drive complex and meaningful projects to a conclusion. Tight-knit collaboration with the engineering teams and an ability to thrive under pressure are key skills required to succeed in this role.
How you'll make an impact
Willingness to dig deep into code, networking, operating systems, and/or storage solutions to solve complex issues
Develop automation and utilize monitoring tools to ensure system reliability
Participate in incident response and troubleshooting
Participate in 24x7 Site Reliability rotations and escalation workflows
Identify and address performance bottlenecks. This will include code optimization, configuration changes, or infrastructure upgrade recommendations.
Collaborate with development teams to ensure software design meets operational requirements
Continuously improve processes and procedures to increase system reliability and efficiency
Stay up-to-date with the latest industry trends and technologies
Experience you'll bring
2 years of hands-on experience as a Site Reliability Engineer or equivalent role
2 years of development experience with at least one major programming language
Experience with Cloud Computing platforms (AWS, Azure, GCP)
Ability to thrive in a fast paced, development and operations world
Strong communication and collaboration skills
Experience with observability and logging technologies
Experience with at least one major RDBMS and NoSQL data store
Experience with containerization technologies such as Docker or Kubernetes
BA/BS degree in Computer Science or related technical field, or equivalent job experience
Nice to have
Experience with one or more of the following languages: C#, Java, GoLang, Python
Experience with infrastructure as code, preferably Terraform
Working knowledge in building and designing RESTful APIs.
Experience with Datadog, Grafana and Splunk
Familiarity with Agile methodologies and practices
Experience with GitOps
Experience with Apache Kafka
Clear All
0 Reliability Engineer jobs found in Portland, ME area