Site Reliability Engineer at Cydarm Technologies 2020 - 2023
Responsibilities
- Write Infrastructure as Code and Configuration Management modules enabling rapid deployment of our application, along with SIEM integrations, to public and private cloud
- Design and implement monitoring and alerting, centralised logging and encrypted backup platforms
- Develop CI/CD pipelines enabling automated application testing and deployment
- Lead SRE team - develop and manage roadmap, OKRs, work closely with development, customer success, sales & marketing team leads
- Developed automation to configure and deploy tailored, self-serve trials of our web app within 15 minutes
- Design and implement alerting and observability solution for production customer deployments using Prometheus/Alertmanager/PagerDuty, Grafana, and Loki/Promtail/Cassandra for log aggregation and alerting
- Design and implement our BC/DR process
- Implement containerised Hashicorp Vault service to move secrets off disk, establish Encryption as a Service endpoint
- Create an automated nightly build/deployment of the platform for smoke tests
- Create and manage build pipelines and underlying infrastructure, pushing container definitions to container repositories (ECR, Github Packages)