Performance Disruption of Phrase IDM (EU) between 6:57 AM and 7:09 AM CET
Incident Report for Phrase
Postmortem

Introduction

We would like to share more details about the events that occurred with Phrase between 06:57 AM CEST and 07:09 AM CEST on June 18, 2024 which led to a performance disruption of the IDM (EU) component and what Phrase engineers are doing to prevent these issues from reoccurring.

Timeline

06:41 AM CEST: Planned underlying infrastructure upgrade complete and post upgrade steps are being applied.

06:57: AM CEST: The rescheduling of IDM resources started rendering the IDM partially unavailable.

07:09 AM CEST: IDM rescheduling complete and service is fully available.

Root Cause

The rescheduling of IDM resources caused a partial unavailability of the IDM application on a load balancer level. During service stop and start, some requests reached already terminating processes or not yet fully initialized containers. 

Actions to Prevent Recurrence

  • Reschedule/deployment procedure and automation will be improved to prevent availability issues on the load balancer level.
  • Monitoring of application and infrastructure behind the load balancer will be improved to ensure application instances in the correct state are active on the load balancer level.
Posted Jun 20, 2024 - 16:39 CEST

Resolved
Our clients might have experienced issues with loading the IDM Platform starting from 6:57 AM CET. We investigated the root cause and identified it around 12 PM CET.
This incident was resolved at 7:09 AM CET.
Posted Jun 18, 2024 - 06:57 CEST