Opportunity Description
Take your career to the next level as an Expert Site Reliability Engineer with IBM Software, driving Confluent incident management efficiencies. Play a key role in improving cloud-based reliability across multiple platforms.
In this hands-on role, you will focus on analyzing systemic failures and instituting proactive improvements. Your time will be split between development work and mentoring, ensuring teams are fully equipped to handle incidents effectively. Collaborate with a global group of engineers dedicated to excellence in service delivery.
Key Responsibilities:
• Investigate and improve incident recurrence prevention strategies
• Oversee integration of incident management tools like Rootly and PagerDuty
• Implement and maintain SLO/SLA frameworks for reliability
• Facilitate training programs and lead post-mortem discussions
• Edit documentation ensuring quality in incident reporting
Requirements:
• 10+ years in site reliability or incident managemen...
In this hands-on role, you will focus on analyzing systemic failures and instituting proactive improvements. Your time will be split between development work and mentoring, ensuring teams are fully equipped to handle incidents effectively. Collaborate with a global group of engineers dedicated to excellence in service delivery.
Key Responsibilities:
• Investigate and improve incident recurrence prevention strategies
• Oversee integration of incident management tools like Rootly and PagerDuty
• Implement and maintain SLO/SLA frameworks for reliability
• Facilitate training programs and lead post-mortem discussions
• Edit documentation ensuring quality in incident reporting
Requirements:
• 10+ years in site reliability or incident managemen...
Interested in this opportunity? Apply now through Expertini.
Apply for this Position