About the Role
We are partnering with a leading public sector technology organisation to hire an experienced DevOps Engineer to join its StackOps (Incident Management) team.
This is an exciting opportunity to work on enterprise-scale cloud platforms that support mission-critical digital services. You'll play a key role in building automation, improving platform reliability, and enhancing incident management capabilities through Infrastructure-as-Code, CI/CD, and observability solutions.
If you're passionate about DevOps, Site Reliability Engineering (SRE), cloud automation, and platform engineering, this role offers exposure to large-scale engineering projects using modern cloud-native technologies.
This will be a 24 months contract
Key Responsibilities
As a DevOps Engineer, you will:
- Develop and maintain Infrastructure-as-Code (IaC) using Terraform, Pulumi, and related automation tools.
- Build and enhance API integrations between observability platforms (e.g. Elastic, AWS/Azure native services, Dynatrace) and IT Service Management (ITSM) platforms.
- Design and develop automation for:Incident responseAlert routingAuto-remediationSelf-healing workflows
- Build, maintain, and optimise CI/CD pipelines to support reliable platform deployment.
- Improve platform reliability, resilience, and operational efficiency through automation and engineering best practices.
- Collaborate closely with Product Leads and engineering teams to continuously enhance the central StackOps platform.
- Troubleshoot production issues and implement scalable automation solutions to reduce operational overhead.
What We're Looking For
Essential Requirements
- Minimum 6 years of experience in DevOps, Site Reliability Engineering (SRE), Platform Engineering, or Software Engineering.
- Strong programming skills in at least one of the following:PythonGoNode.js
- Hands-on experience with:TerraformPulumiAnsibleAWS and/or Microsoft AzureCI/CD pipeline developmentREST API integration
- Experience working with observability and monitoring platforms such as:Elastic StackOpenTelemetryDynatraceDatadog
- Experience integrating ITSM platforms, including:Jira Service Management (JSM)ServiceNowPagerDuty
- Strong analytical, troubleshooting, and automation skills.
- Ability to work effectively in a collaborative, fast-paced engineering environment.
Preferred Experience
Candidates with experience in the following areas will be highly regarded:
- Large-scale enterprise or public sector platform engineering
- Internal Developer Platforms (IDP)
- Incident Management platforms
- AIOps solutions
- Auto-remediation and self-healing automation
- High availability (HA) and resiliency engineering
- Enterprise monitoring and observability platforms
- Cloud-native platform engineering and operational excellence