Cloud Operations Engineer – Full-Time Job, Bucharest

Job responsibilities

Provide 24/7 support & incident management.
Participate in on-call rotation & support, ensuring stability and performance of production environments.
Respond to incidents or issues reported by CS, CSE or monitoring alerts.
Run recovery jobs and follow the steps based on the SOP’s.
Take proactive actions to address infrastructure issues to mitigate and prevent production outages.
Respond to monitoring alerts according to defined SOP’s.
Participate in Post Incident Reviews and discussions.
Build effective working relationships with peers across the global locations.
Make suggestions for process improvements and enhanced operational efficiencies.
Strong experience with Monitoring and Alerting Tools: CloudWatch, Grafana, PagerDuty.
Provide superior problem remediation support within the web/application/cloud/container tier environments in support of negotiated Service Level Agreements (SLA’s).
Partner and collaborate with SRE or CAE team to build automation to prevent problem recurrence.

Minimum of 2 years of development experience in a cloud environment.
Minimum of 2 years in incident response and major incident management.
Minimum of 4 years of Linux experience.
Passionate about solving and analyzing problems in a global scale distributed system.
Ability to prioritize and stay on top of all incidents reported.
Working knowledge with configuration tools such as Chef, Puppet, Ansible, Rundeck.
Experience in troubleshooting database or ETL related issues.