Degraded Performance of Project Management component in Phrase TMS (EU) between 10:59 AM CET and 12:12 PM CET
Incident Report for Phrase
Postmortem

Introduction

We would like to share more details about the events that occurred with Phrase between 10:58 AM CET and 12:01 PM CET on March 20th, 2024 which led to a performance degradation of the Phrase TMS service and what Phrase engineers are doing to prevent these issues from reoccurring.

Timeline

10:59 AM CET: The TM sharing with collaborators feature was enabled.

11:04 AM CET: A high load on the database was detected. The team began investigating the issue.

12:01 PM CET: The problem was identified and the TM sharing with collaborators feature was disabled.

12:12 PM CET: The database stabilized.

Root Cause

We enabled the new TM sharing with collaborators feature and the implementation on a large production data set caused a high load on our database. This impacted database connections as a whole. Due to the smaller data set on the testing environment, this issue was not detected earlier. Disabling this feature reduced loads to the database and it returned to a normal state and the servers worked as expected. 

 Actions to Prevent Recurrence

  • Teams will be trained to notify all stakeholders about similar configuration changes.
  • The visibility of all configuration changes will be improved so it requires no extra time to see what has been recently changed in the configuration.
  • Before enabling this new feature again, database queries will be optimized so that this problem does not reoccur.

Conclusion

Firstly, we want to apologize. We know how critical our services are to your business. Phrase as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Phrase engineers will be working tirelessly over the next coming days and weeks on improving their understanding of the incident and determining how to make changes that improve our services and processes.

Posted Mar 22, 2024 - 17:34 CET

Resolved
This incident has been resolved.
Posted Mar 20, 2024 - 12:12 CET
Identified
The issue has been identified and a fix is being implemented.
Posted Mar 20, 2024 - 12:01 CET
Investigating
We are currently investigating the issue.
Posted Mar 20, 2024 - 10:59 CET
This incident affected: Phrase TMS (EU) (Project management).